Learning Transformer Programs with Dan Friedman

EPISODE 667
|
JANUARY 15, 2024
Watch
Play
Don't Miss an Episode!  Join our mailing list for episode summaries and other updates.

About this Episode

Today, we continue our NeurIPS series with Dan Friedman, a PhD student in the Princeton NLP group. In our conversation, we explore his research on mechanistic interpretability for transformer models, specifically his paper, Learning Transformer Programs. The LTP paper proposes modifications to the transformer architecture which allow transformer models to be easily converted into human-readable programs, making them inherently interpretable. In our conversation, we compare the approach proposed by this research with prior approaches to understanding the models and their shortcomings. We also dig into the approach’s function and scale limitations and constraints.

About the Guest

Dan Friedman

Princeton

Connect with Dan

Resources

Related Topics