Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu
EPISODE 693
|
JULY
16,
2024
Watch
Follow
Share
About this Episode
Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications.
About the Guest
Albert Gu
Carnegie Mellon University
Resources
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
- Efficiently Modeling Long Sequences with Structured State Spaces
- Improving the Gating Mechanism of Recurrent Neural Networks
- CKConv: Continuous Kernel Convolution For Sequential Data
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models
- On the Parameterization and Initialization of Diagonal State Space Models
- MambaByte: Token-free Selective State Space Model
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- RWKV Language Model
- Jamba: A Hybrid Transformer-Mamba Language Model
- Zyphra Unveils Zamba: A Compact 7B SSM Hybrid Model
- Long Context Language Models and their Biological Applications with Eric Nguyen - #690
- Language Modeling With State Space Models with Dan Fu - #630
