Proactive Agents for the Web with Devi Parikh
EPISODE 756
|
NOVEMBER
18,
2025
Watch
Follow
Share
About this Episode
Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces. Devi also shares insights into Yutori’s training pipeline, which has evolved from supervised fine-tuning to include rejection sampling and reinforcement learning. Finally, we discuss how Yutori’s “Scouts” agents orchestrate multiple tools and sub-agents to handle complex queries, the importance of background, "ambient" operation for these systems, and what the path looks like from simple monitoring to full task automation on the web.
About the Guest
Devi Parikh
Yutori
Resources
- Yutori
- Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization
- Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
- Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
- Emu Edit: Precise Image Editing via Recognition and Generation Tasks
- Video Editing via Factorized Diffusion Distillation
- Fashion++: Minimal Edits for Outfit Improvement
- Qwen
- ChatGPT Atlas
- Comet Browser
- Dia Browser
- Llama 3
- Human-AI Collaboration for Creativity with Devi Parikh - #399
- Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713
- Building Maps and Spatial Awareness in Blind AI Agents with Dhruv Batra - #629
