Playback speed
×
Share post
Share post at current time
0:00
/
0:00
Transcript
In The Future of Voice AI series of interviews, I ask three questions to my guests:

- What problems do you currently see in Enterprise Voice AI?
- How does your company solve these problems?
- What solutions do you envision in the next 5 years?

This episode’s guest is Kwindla Hultman Kramer, Co-Founder & CEO at Daily.

Kwin is CEO and co-founder of Daily, a developer platform for real-time audio, video, and AI. He has been interested in large-scale networked systems and real-time video since his graduate student days at the MIT Media Lab.  Before Daily, Kwin helped to found Oblong Industries, which built an operating system for spatial, multi-user, multi-screen, multi-device computing. 

Daily makes developer tools and infrastructure for real-time audio, video, and AI. The company was founded in 2016 with the goal of making it easier to embed real-time communications into websites and applications. Today, Daily powers telehealth, education, workplace collaboration, customer support, social, and gaming applications for thousands of developers and product teams. Daily's core competence is delivering reliable, high-quality, low-latency audio and video streams to any device, on any network, anywhere in the world. 

Recap Video

Takeaways

  • This platform change (AI) is going to change how we think about computers and how we use them

  • Daily’s focus is real-time communications

  • In the last 2 years many of their customers have been asking: can AI participants be part of the sessions?

  • Daily is built to deliver low-latency voice and video, interactive conversational AI applications

  • To build an interactive AI app, you need to

    • Send audio from user’s device

    • Transcribe the audio

    • Run LLM inference

    • Likely do an API call

    • Convert text to speech and send it back

    • This whole pipeline must be cancellable/interruptible at any point

  • Having an open-source layer for this is very important

  • Better to do it in the cloud than on-device

  • Daily delivers the transport layer, the bottom layer of the stack (WebRTC)

  • Daily’s built an open source layer called Pipecat to enable more apps

  • After the introduction of GPT-4o, the optimal architecture has changed entirely

  • GPT-4o collapses 3-4 steps that we had to do separately before (transcription, phrase endpointing, LLM inference, TTS)

  • Before GPT-4o the best latency was 800ms. Now it’s 300ms.

  • GPT-4o audio feels first class and it opens up a whole bunch of new use cases

  • Daily is the WebRTC network glue between a user on a device and servers that are generating audio/video

  • It can support 100K+ people in a single session

  • Main use cases: Healthcare, education, workforce, social, gaming, customer support

  • Soon, all games will have real-time conversational AI characters in them