Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini
EPISODE 724
|
MARCH
24,
2025
Watch
Follow
Share
About this Episode
Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language.
About the Guest
Julie Kallini
Stanford University
Resources
- MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
- Mission: Impossible Language Models
- Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
- CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
- Hierarchical Transformers Are More Efficient Language Models
- XNLI: Evaluating Cross-lingual Sentence Representations
- TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
- Byte Latent Transformer: Patches Scale Better Than Tokens
- Priorless Recurrent Networks Learn Curiously
