What a week! GPT-Realtime, MS VibeVoice, Cloudflare Voice Agents and much more 🔥
Voice AI weekly digest
Top Updates 💪
GPT-Realtime and Realtime API updates for production voice agents (OpenAI)
Microsoft AI launches its first in-house models (The Verge)
48% of CX leaders plan to access ai via BPO partners (360 Magazine)
VibeVoice-1.5B: Open-source, long-form, expressive TTS from Microsoft (X)
Assort Health nabs $50M to automate patient phone calls (TechCrunch)
Live translation and language learning tools in Google Translate (Google Blog)
Cloudflare is the best place to build realtime voice agents (Cloudflare)
Salesforce, Berkeley unveil BFCL Audio benchmark for voice AI (StartupHub.ai)
xAI tests new object highlighting feature in Grok voice mode (TestingCatalog)
Google's NotebookLM now talks in 80+ languages (TechTimes)
Apple reportedly wants to integrate Gemini into Siri (ProductNation)
Plaud upgrades its card-sized AI note-taker with better range (The Verge)
Thoma Bravo acquires Verint to join forces with Calabrio (Business Wire)
LivePerson integrates AWS to unify voice and digital CX (Business Wire)
Sendbird launches voice AI agents for human-like conversations (SiliconANGLE)
Rime transforming healthcare with voice AI (Oracle Blog)
Voice AI market surge: $20B enterprise adoption reality (Dataconomy)
Vox AI raises $8.7 million to bring voice AI to restaurants (PYMNTS)
ElevenLabs Agents can now navigate IVR phone trees (X)
Notta and ElevenLabs launch "Voices for All" (GlobeNewswire)
Copilot’s new audio AI sounds more personal than ChatGPT (Windows Latest)
Our Latest Article
Engineering Corner 😎
April – Voice AI to manage your email and calendar (Hacker News)
ClearMask: Noise-free and naturalness-preserving protection against voice deepfake attacks (arXiv)
Step-Audio 2: New end to end multimodal LLM for audio & speech (X)
State-of-the-art image generation Leonardo models and TTS Deepgram models now available in Workers AI (The Cloudflare Blog)
VoiceType AI: STT tool for Mac and PC hits 99.7% accuracy (Cult of Mac)
Marvis: A new open-source local-first TTS (LinkedIn)
Whisper Notes: Offline STT powered by Whisper Large-V3-Turbo (AppleVis)
Voice Agent in AI: Top nine platforms for 2025 (MarkTechPost)