Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping
EPISODE 723
|
MARCH
17,
2025
Watch
Follow
Share
About this Episode
Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in latent space.” We dig into “internal reasoning” versus “verbalized reasoning”—analogous to non-verbalized and verbalized thinking in humans, and discuss how the model searches in latent space to predict the next token and dynamically allocates more compute based on token difficulty. We also explore how the recurrent depth architecture simplifies LLMs, the parallels to diffusion models, the model's performance on reasoning tasks, the challenges of comparing models with varying compute budgets, and architectural advantages such as zero-shot adaptive exits and natural speculative decoding.
About the Guest
Jonas Geiping
ELLIS Institute and Max Planck Institute for Intelligent Systems Tübingen
Resources
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
- Universal Transformers
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- MoEUT: Mixture-of-Experts Universal Transformers
- OLMo models
- Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678
- Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721
- Speculative Decoding and Efficient LLM Inference with Chris Lott - #717
