The Race to Production-Grade Diffusion LLMs with Stefano Ermon
EPISODE 764
|
MARCH
26,
2026
Watch
Follow
Share
About this Episode
Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like.
About the Guest
Stefano Ermon
Stanford University; Inception
Resources
- Inception
- Ermon Group
- Ermon Group Blog
- Introducing Mercury 2
- Introducing Mercury, the World’s First Commercial-Scale Diffusion Large Language Model
- Mercury: Ultra-Fast Language Models Based on Diffusion
- Copilot Arena
- Solving Inverse Problems in Medical Imaging with Score-Based Generative Models
- Cosmos World Foundation Model Platform for Physical AI
- Midjourney
- Gemini Diffusion
- Gemini 3.1 Pro
- Introducing Claude Opus 4.6
- GPT-4o mini: advancing cost-efficient intelligence
- Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
- Gemini 2.0: Flash, Flash-Lite and Pro
- OpenClaw
- Bytedance Seed
- Large Language Diffusion Models
- SGLang
- TensorFlow/TensorRT integration
- Kilo Code
- Cline Bot
- Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
