Mistral launches Voxtral, Krisp launches VIVA, Hume launches EVI3 and much more

Voice AI weekly digest

Davit Baghdasaryan

Jul 21, 2025

Top Updates 💪

Mistral releases Voxtral, its first open source AI audio model (TechCrunch)
Krisp launches VIVA SDK for improving turn-taking and achieves 1B mins/month milestone (SiliconAngle)
Adobe's new AI tool turns silly noises into realistic audio effects (The Verge)
Hume AI launches EVI 3, a speech-to-speech model that mimics voice, style (X)
NVIDIA AI releases Canary‑Qwen 2.5B ASR‑LLM hybrid model (MarkTechPost)
Nvidia released an open-source model Audio Flamingo 3 (MarkTechPost)
Microsoft says this new voice conversion feature will improve AI dubbing (Slator)
An open-source conversational AI platform, intervo.ai is now live (X)
Zoho develops large language model with speech recognition (The Hindu)
Enhancing multilingual speech with NVIDIA Riva TTS (NVIDIA Developer Blog)
Vonage partners with AWS to unveil AI voice agent integration (PR Newswire)
Deepgram receives 2025 voice AI excellence award (Morningstar)
Squaretalk’s voice AI for fintech onboarding and fraud prevention (Fintech News)
How synthetic voice is redefining multilingual communication (Telecom Reseller)
AI voice tech boosts health insurance call center efficiency (Medium)
UCaaS and CCaaS: A strategic move for the modern enterprise (UC Today)
AI-powered TWS Earbuds: Real-time translation and transcription (Yanko Design)
Peter Piper Pizza introduces voice AI phone ordering (SoundHound)
Conversational banking: AI-powered financial CX (CX Network)
SF AI startup Cluely wants to help you ‘cheat on everything’ (KRON4)
ChatGPT Agent and the autonomous AI revolution (FourWeekMBA)
Mistral AI adds voice AI tools to Le Chat (Artificial Intelligence News)

Voice AI Podcast 🎙️

In case you missed the latest episode of Voice AI Podcast…

Engineering Corner 😎

Audio Flamingo 3 is now released (X)
The pipecat-esp32 client runs on the AtomS3R (LinkedIn)
ThinkSound by Alibaba for audio generation and editing (X)
Open-source framework for real-time AI voice (X)
Indic-Parler-TTS: 24 languages, 8000+ hours, 80k+ downloads (X)
IndexTTS2: A Breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot TTS (Index‑TTS2)
UniSLU: Unified spoken language understanding from heterogeneous cross-task datasets (arXiv)
Fast inference end-to-end speech synthesis with style diffusion (MDPI)
Got ChatGPT Plus? Record and summarize meetings on a Mac now (ZDNet)
WhisperKit: On-device real-time ASR with billion-scale transformers (arXiv)
Building voice agents: The revolutionary future of customer support is here (DEV)
Top open‑source TTS models in 2025 (Modal)
Speechmap.ai: Free speech dashboard for AI (SpeechMap.AI)
Syllable.ai: Revolutionizing enterprise Voice AI automation (SaaStr)
Whisper in the Wild: OpenAI's STT model in production (AI Mind)
AI TTS could “unlearn” how to imitate certain people (MIT Technology Review)

Voice AI Newsletter

Discussion about this post

Ready for more?