Unifying Vision and Language Models with Mohit Bansal
EPISODE 636
|
JULY
3,
2023
Watch
Follow
Share
About this Episode
Today we're joined by Mohit Bansal, Parker Professor, and Director of the MURGe-Lab at UNC, Chapel Hill. In our conversation with Mohit, we explore the concept of unification in AI models, highlighting the advantages of shared knowledge and efficiency. He addresses the challenges of evaluation in generative AI, including biases and spurious correlations. Mohit introduces groundbreaking models such as UDOP and VL-T5, which achieved state-of-the-art results in various vision and language tasks while using fewer parameters. Finally, we discuss the importance of data efficiency, evaluating bias in models, and the future of multimodal models and explainability.
About the Guest
Mohit Bansal
University of North Carolina
Resources
- Paper: VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
- Paper: LXMERT: Learning Cross-Modality Encoder Representations from Transformers
- Paper: Unifying Vision-and-Language Tasks via Text Generation
- Paper: Unifying Vision, Text, and Layout for Universal Document Processing
- Paper: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
- Paper: VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
- Paper: VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
- Paper: Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
- Paper: TVLT: Textless Vision-Language Transformer
- Paper: DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models
- Unified and Efficient Multimodal Pretraining Across Vision and Language
- Unified and Efficient Vision-and-Language Modeling
- How Deep Learning has Revolutionized OCR with Cha Zhang
- MBZUAI AI Quorum's Inaugural NLP Symposium
- ICML 2022 Pre-training Workshop
