Video as a Universal Interface for AI Reasoning with Sherry Yang
EPISODE 676
|
MARCH
18,
2024
Watch
Follow
Share
About this Episode
Today we’re joined by Sherry Yang, senior research scientist at Google DeepMind and a PhD student at UC Berkeley. In this interview, we discuss her new paper, "Video as the New Language for Real-World Decision Making,” which explores how generative video models can play a role similar to language models as a way to solve tasks in the real world. Sherry draws the analogy between natural language as a unified representation of information and text prediction as a common task interface and demonstrates how video as a medium and generative video as a task exhibit similar properties. This formulation enables video generation models to play a variety of real-world roles as planners, agents, compute engines, and environment simulators. Finally, We explore UniSim, an interactive demo of Sherry's work and a preview of her vision for interacting with AI-generated environments.
About the Guest
Sherry Yang
Google DeepMind, UC Berkeley
Resources
- Paper: Video as the New Language for Real-World Decision Making
- Paper: Learning Interactive Real-World Simulators
- UniSim: Learning Interactive Real-World Simulators
- https://deepmind.google/technologies/alphago/
- https://openai.com/sora
- Reinforcement Learning for Industrial AI with Pieter Abbeel - #476
- Reinforcement Learning Deep Dive with Pieter Abbeel - #28