Top Updates 💪
- Hume launches TTS model Octave with customizable AI voices (VentureBeat) 
- ElevenLabs is launching its own STT model (Techcrunch) 
- ConverzAI raises $16M for AI recruiters with 30% efficiency boost (VentureBeat) 
- Salesforce and Google bring Gemini to Agentforce, enabling more customer choice in major partnership expansion (Salesforce) 
- Announcing free, unlimited access to Think Deeper and Voice (Microsoft) 
- Empowering innovation: The next generation of the Phi family (Microsoft) 
- Real-time translation, accent smoothing, AI agents – Krisp & CX Today explore the future of CX (CX Today) 
- Scammers use voice clips to create AI clones (CNET) 
- Zoom secures its largest-ever contact center deal (CX Today) 
- Speechmatics unveils speaker diarization to improve meetings AI (UC Today) 
- How AI voice will change advertising (Voices) 
- Telnyx unveils Voice AI for human-like conversations at scale (GlobeNewswire) 
- Deepdub partners with AWS to advance AI media localization (PR Newswire) 
- Cresta announces rapidly scaling AI voice agents in production (PR Newswire) 
- Bliro raises €28M for AI-powered conversation intelligence platform (Tech) 
- Bridgetown Research raises $19M in Series A funding (Finsmes) 
- GibberLink: Breakthrough in how voice assistants communicate AI-to-AI (eWeek) 
Voice AI Podcast 🎙️
In case you missed the latest episode of Voice AI Podcast…
Immersive Experiences with Voice AI | Alex Bordanova (Chief Product & Technology Officer at Voicemod AI)
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?
Notable on X
Engineering Corner 😎
- Speech emotion recognition using fine-tuned Wav2vec2.0 and neural controlled differential equations classifier (ResearchGate) 
- The technical blueprint behind Superdial’s healthcare voice agents (OpusResearch) 
- Deepgram’s STT model secret: synthetic data generation (DataScienceCentral) 
- How the Emilia dataset advances multilingual voice synthesis (MarkTechPost) 
- Combining TF-gridNet and mixture encoder for continuous speech separation for meeting transcription (Arxiv) 
- Enhancing multimodal AI: bridging audio, text, and vector search (Dev) 
- Unlocking scalable audio transcription with Gemini (Cloud.google) 
- Best AI voice agents (Play) 









