About This Episode
Today we’re joined by Li Jiang, a distinguished engineer at Microsoft working on Azure Speech.
In our conversation with Li, we discuss his journey across 27 years at Microsoft, where he’s worked on, among other things, audio and speech recognition technologies. We explore his thoughts on the advancements in speech recognition over the past few years, the challenges, and advantages, of using either end-to-end or hybrid models.
We also discuss the trade-offs between delivering accuracy or quality and the kind of runtime characteristics that you require as a service provider, in the context of engineering and delivering a service at the scale of Azure Speech. Finally, we walk through the data collection process for customizing a voice for TTS, what languages are currently supported, managing the responsibilities of threats like deep fakes, the future for services like these, and much more!
Watch on Youtube
Thanks to our Sponsor!
Thanks to Microsoft for their support for the show, and their sponsorship of this series of episodes highlighting just a few of the fundamental innovations behind Azure Cognitive Services! Cognitive Services is a portfolio of domain-specific capabilities that brings AI within reach of every developer—without requiring machine-learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps. Visit aka.ms/cognitive to learn how customers like Volkswagen, Uber, and the BBC have used Azure Cognitive Services to embed services like real-time translation, facial recognition, and natural language understanding to create robust and intelligent user experiences in their apps. While you’re there, you can take advantage of the $200 credit to start building your own intelligent applications when you open an Azure Free Account.