Today we’re joined by Stefan Lee, assistant professor at the school of electrical engineering and computer science at Oregon State University.
Subscribe: iTunes / Google Play / Spotify / RSS
Stefan, who we sat down with at NeurIPS this past winter, is focused on the development of agents that can perceive their environment and communicate their understanding with humans in order to coordinate their actions to achieve mutual goals.
In our conversation, we focus on his paper ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, a model for learning joint representations of image content and natural language. We talk through the development and training process for this model, the adaptation of the training process to incorporate additional visual information to BERT models, where this research leads from the perspective of integration between visual and language tasks and finally, we discuss the importance of visual grounding.
Connect with Stefan!
Resources
- Paper: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Paper: Chasing Ghosts: Instruction Following as Bayesian State Tracking
- Paper: Deal or No Deal? End-to-End Learning for Negotiation Dialogues
- LoCoBot: An Open Source Low Cost Robot
- Conceptual Captions Dataset
- Paper: nocaps: novel object captioning at scale
Join Forces!
- Join the TWIML Community!
- Check out our TWIML Presents: series page!
- Register for the TWIML Newsletter
- Check out the official TWIMLcon:AI Platform video packages here!
- Download our latest eBook, The Definitive Guide to AI Platforms!
“More On That Later” by Lee Rosevere licensed under CC By 4.0
Leave a Reply