Learning Visiolinguistic Representations with ViLBERT with Stefan Lee
EPISODE 358
|
MARCH
19,
2020
Watch
Follow
Share
About this Episode
Today we're joined by Stefan Lee, assistant professor at the school of electrical engineering and computer science at Oregon State University.
Stefan, who we sat down with at NeurIPS this past winter, is focused on the development of agents that can perceive their environment and communicate their understanding with humans in order to coordinate their actions to achieve mutual goals.
In our conversation, we focus on his paper ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, a model for learning joint representations of image content and natural language. We talk through the development and training process for this model, the adaptation of the training process to incorporate additional visual information to BERT models, where this research leads from the perspective of integration between visual and language tasks and finally, we discuss the importance of visual grounding.
About the Guest
Stefan Lee
Oregon State University
Connect with Stefan
Resources
- Paper: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Paper: Chasing Ghosts: Instruction Following as Bayesian State Tracking
- Paper: Deal or No Deal? End-to-End Learning for Negotiation Dialogues
- LoCoBot: An Open Source Low Cost Robot
- Conceptual Captions Dataset
- Paper: nocaps: novel object captioning at scale

