Playback speed
×
Share post
Share post at current time
0:00
/
0:00
Transcript
In The Future of Voice AI series of interviews, I ask three questions to my guests:

- What problems do you currently see in Enterprise Voice AI?
- How does your company solve these problems?
- What solutions do you envision in the next 5 years?

This episode’s guest is Scott Stephenson, Co-Founder & CEO at Deepgram.

Scott is a dark matter physicist turned Deep Learning entrepreneur. He earned a PhD in particle physics from University of Michigan where his research involved building a lab two miles underground to detect dark matter. Scott left his physics post-doc research position to found Deepgram.

Deepgram is one of the largest API companies offering Speech AI technologies such as Speech-to-Text, Audio Intelligence and the recently launched Text-to-Speech. Deepgram’s technology provides high accuracy and naturalness across multiple languages and accents. The major use cases include contact centers, conversational AI, media transcription, and speech analytics.

Recap Video

Takeaways

  • Deepgram is building its own ASR models and this gives the ability to tune and scale the models

  • Their infrastructure handles 100K real-time conversations (on average) at any moment of the day

  • It’s easy to get an AI model to work but way, way harder to scale it with a 10x cheaper price

  • The vast majority of Deepgram use cases are Speech to text. But Text to Speech is starting to take off as well

  • When competing with large companies (Google, Amazon, MS, etc.), it’s important to realize that you are not really competing with the entire company but a small technical team who are generally less motivated than your startup

  • Accuracy, speed and price are the top 3 problems in Speech-to-text

  • Speech-to-Text prices have already decreased by 10x. Another 10x decrease is unlikely in the near future, at least not in the real-time use case.

  • Faster AI inference chips will allow for larger and more accurate models with the same pricing

  • Under 500ms latency is critical for Voice Bots’ use case

  • Deepgram offers super low latency STT and super low latency TTS today