Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin
EPISODE 734
|
JUNE
4,
2025
Watch
Follow
Share
About this Episode
Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field.
About the Guest
Charles Martin
Calculation Consulting
Resources
- Calculation Consulting
- WeightWatcher
- Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning (HTSR paper)
- Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data
- SETOL: A Semi-Empirical Theory of (Deep) Learning
- NeurIPS2023 Invited Talk: Heavy-Tailed Self-Regularization in Deep Neural Networks
- WeightWatcher (WW)
- WeightWatcher DataFree Diagnostics for Deep Learning (Cohere Talk)
- The Leaderboard Illusion
- Book: How Nature Works: The Science of Self-Organised Criticality
- Do Language Models Use Their Depth Efficiently?
- SAM Models
- Llama-Guard Models
- Book: Why Stock Markets Crash
- The Bitcoin Crash and How Nature Works
- Annoy