This Week in Machine Learning & AI This Week in Machine Learning & AI This Week in Machine Learning & AI This Week in Machine Learning & AI This Week in Machine Learning & AI

    Photo of Scott Stephenson of Deepgram for This Week in Machine Learning & AI Interview

    From Particle Physics to Audio AI with Scott Stephenson

    800 800 This Week in Machine Learning & AI

    This week my guest is Scott Stephenson. Scott is co-founder & CEO of Deepgram, a startup developing an AI-based platform for indexing and searching audio and video. Scott and I cover a ton of interesting topics including applying machine learning techniques to particle physics, his time in a lab two miles below the surface of the earth, applying neural networks to audio, and Kur, the deep learning framework that his company developed and open-sourced.

    Get Your Stickers
    This Week in Machine Learning & AI Sticker

    Yep, we’ve got some nice new stickers for the podcast, and we want to give you one! We’re continuing the sticker contest we kicked off last week. Send us your favorite quote from today’s show via a comment or post on Facebook, Twitter, Youtube or SoundCloud, as well as via this show notes page, and we’ll send a sticker your way!

    Future of Data Summit

     

    The Future of Data Summit

    I invite you to join me at the Future of Data Summit, an event I’m organizing in conjunction with the Interop ITX conference in Las Vegas. At the event you’ll learn from industry leaders and technology users on how they’re taking advantage of emerging data-centric technologies like IoT, blockchain, deep learning, and more. To learn more about the Summit visit twimlai.com/futureofdata. The code I provide on that page, which is simply my last name, CHARRINGTON, provides TWiML listeners with a 20% discount when they register for Interop.

    About Scott Stephenson

    Mentioned in the Interview

    13 comments
    • Evan Oman
      REPLY

      I loved Scott’s view on OSS:

      “There is a ton of demand for talent… You don’t have to be secretive. You don’t have to be like: ‘This is our secret sauce.’ Everyone is talent limited, they’re computationally limited, they are data limited. They are not good idea limited.”

    • Pat
      REPLY

      Fantastic interview! I loved how they described their approach. I’m also curious how Deepgram compares to other deployed commercial solutions. I’m thinking Microsoft OneNote’s audio indexing and search capability.

      • sam
        REPLY

        I’ll point Scott here to chime in, but I can think of two differences: The first is the ability to build your own applications around the capability via their APIs, and the second is the ability to augment the system with your own training data to yield greater accuracy on the types of things your audio is about. Oh, and support for video would be a third. @Scott?

      • Scott Stephenson
        REPLY

        Great question about OneNote!

        The overall technique they use is closer to what we do at Deepgram (if you compare it against standard speech-to-text search). As far as I can tell, OneNote generates a phonetic index which works well in certain areas where speech-to-text doesn’t, but there are tradeoffs. This is usually where context matters. Like a group of short words where if you searched for phonemes you might match a lot of false positive results, or you might end up mistaking parts within larger words. For Deepgram, we do something akin to phoneme search in the CNN stack, then the RNN cleans it up to remove false positives (since that can keep track of context).

        Scale is also a big concern at DG. The computational cost of doing it the OneNote way is pretty steep since it’ll take a couple hours to index an hour of recorded audio (it’s offloaded to the client computer) whereas Deepgram does it much faster (it’ll take just a minute to index an hour). So we can tackle large datasets without breaking the bank. 🙂

        • Pat
          REPLY

          Scott,

          Thank you for the fantastic follow-up. That definitely helped clear up the different implementations and illustrates the challenges with dealing with false positives in this domain. I appreciate you (and the TWiML&AI team) providing such detailed responses on this blog!

    • Simon Hudson
      REPLY

      Hi there, I’ve been listening for some weeks now and love the show.

      I’m a web developer, devops engineer and physics student, and love this episode for Scott Stephenson’s comments about AI and machine learning techniques that were applicable to his previous work in particle physics, as well as his current work with DeepGram.

      My favourite quote of his is: “AI is kind of just information physics” – awesome, and so true.

      I’ve also had the privilege of contributing to DeepGram’s Kur platform via their public repository at https://github.com/deepgram/kur, and hope to start integrating this fascinating solution into my own projects soon.

      Thanks again and keep up the good work!
      Simon

    • Abdurrahman Ahmed
      REPLY

      Thank you very much for the great episode!

      Scott mentions that instead of indexing the automatic transcription of speech, the activations deep in the NN are indexed instead. How is the link then made to the text query? Do you automatically convert the text of the query to speech and search for the activations of that?

    • deepa
      REPLY

      Hi Scott, For keyword spotting, do you use the fully connected layer followed by the CTC stage after the RNN or do you build the index at the output of the RNN?

    Leave a Reply

    Your email address will not be published.