AI Trends 2024: Computer Vision with Naila Murray
EPISODE 665
|
JANUARY
2,
2024
Watch
Follow
Share
About this Episode
Today we kick off our AI Trends 2024 series with a conversation with Naila Murray, director of AI research at Meta. In our conversation with Naila, we dig into the latest trends and developments in the realm of computer vision. We explore advancements in the areas of controllable generation, visual programming, 3D Gaussian splatting, and multimodal models, specifically vision plus LLMs. We discuss tools and open source projects, including Segment Anything–a tool for versatile zero-shot image segmentation using simple text prompts clicks, and bounding boxes; ControlNet–which adds conditional control to stable diffusion models; and DINOv2–a visual encoding model enabling object recognition, segmentation, and depth estimation, even in data-scarce scenarios. Finally, Naila shares her view on the most exciting opportunities in the field, as well as her predictions for upcoming years.
About the Guest
Naila Murray
Meta
Resources
- Paper: Adding Conditional Control to Text-to-Image Diffusion Models
- Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
- Pix2Video: Video Editing using Image Diffusion
- Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models (ZestGuide)
- Visual Programming: Compositional visual reasoning without training
- Paper: ViperGPT: Visual Inference via Python Execution for Reasoning
- ViperGPT (codex)
- Paper: 3D Gaussian Splatting for Real-Time Radiance Field Rendering
- Dynamic 3D Gaussians
- Paper: Visual Instruction Tuning
- Paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
- Segment Anything
- ControlNet
- DINOv2
- DINOv2: A Self-supervised Vision Transformer Model
- Blog: The State of Computer Vision at Hugging Face
- Blog: LINGO-1: Exploring Natural Language for Autonomous Driving
- Paper: PaLM-E: An Embodied Multimodal Language Model
- Paper: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- Blog: Introducing Habitat 3.0: The next milestone on the path to socially intelligent robots
- Paper: Tuning computer vision models with task rewards
- Dual-Guided Brain Diffusion Model: Natural Image Reconstruction from Human Visual Stimulus fMRI
- @aivanlogic tweet
- CLIP: Connecting text and images
- DALL·E: Creating images from text
- Voyager: An Open-Ended Embodied Agent with Large Language Models
- Paper: Toolformer: Language Models Can Teach Themselves to Use Tools
- Scaling GAIA-1: 9-billion parameter generative world model for autonomous driving
- Learning Representations for Visual Search with Naila Murray - #190

