Solving the Cocktail Party Problem with Machine Learning with Jonathan Le Roux
EPISODE 555
|
JANUARY
24,
2022
Watch
Follow
Share
About this Episode
Today we’re joined by Jonathan Le Roux, a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL). At MERL, Jonathan and his team are focused on using machine learning to solve the “cocktail party problem”, focusing on not only the separation of speech from noise, but also the separation of speech from speech. In our conversation with Jonathan, we focus on his paper The Cocktail Fork Problem: Three-Stem Audio Separation For Real-World Soundtracks, which looks to separate and enhance a complex acoustic scene into three distinct categories, speech, music, and sound effects. We explore the challenges of working with such noisy data, the model architecture used to solve this problem, how ML/DL fits into solving the larger cocktail party problem, future directions for this line of research, and much more!
About the Guest
Jonathan Le Roux
Mitsubishi Electric Research Laboratories
Resources
- Video: The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
- Paper: Single-Channel Multi-Speaker Separation using Deep Clustering
- Video: Jonathan Le Roux Plenary Talk JSALT 2020 Workshop
- Paper: Seamless Speech Recognition
- Video: SANE2019 | Jonathan Le Roux - Seamless ASR Demo
- Paper: Unsupervised Sound Separation Using Mixture Invariant Training
- Paper: Some experiments on the recognition of speech, with one and two ears
- Paper: 3-07 - Hierarchical Musical Instrument Separation
- Paper: Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision
- Paper: Deep clustering: Discriminative embeddings for segmentation and separation
- Paper: A Purely End-to-end System for Multi-speaker Speech Recognition