Learning Visiolinguistic Representations with ViLBERT with Stefan Lee

EPISODE 358
|
MARCH 19, 2020
Watch
Banner Image: Stefan Lee - Podcast Interview
Don't Miss an Episode!  Join our mailing list for episode summaries and other updates.

About this Episode

Today we're joined by Stefan Lee, assistant professor at the school of electrical engineering and computer science at Oregon State University. Stefan, who we sat down with at NeurIPS this past winter, is focused on the development of agents that can perceive their environment and communicate their understanding with humans in order to coordinate their actions to achieve mutual goals. In our conversation, we focus on his paper ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, a model for learning joint representations of image content and natural language. We talk through the development and training process for this model, the adaptation of the training process to incorporate additional visual information to BERT models, where this research leads from the perspective of integration between visual and language tasks and finally, we discuss the importance of visual grounding.

About the Guest

Stefan Lee

Oregon State University

Connect with Stefan

Resources

Related Topics