Crazy Week 🔥 Updates from Anthropic, OpenAI, Krisp, Assembly and much more!
Voice AI weekly digest
Top Updates 💪
Assembly launches Universal Pro-3 streaming (Assembly)
Anthropic launches voice mode for Claude Code (MLQ)
OpenAI develops a ‘Bidirectional’ Audio Model to boost Voice Assistants (The Information)
Krisp launches listener-side, real-time Accent Conversion (SiliconANGLE)
AI Vocal Cloning and the Limits of Voice-Based Authentication (BISI)
Huawei launched next-generation voice virtual agents (Huawei)
Modulate adds nuance to voice analysis (NoJitter)
Deutsche Telekom partners with ElevenLabs to bring AI assistant to calls (Wired)
Alibaba Tongyi unveils Fun-CosyVoice3.5 and Fun-AudioGen-VD with FreeStyle voice generation (Pandaily)
Voice AI platform VoiceLine raises 10M EUR in series A (Slator)
LevelAI expands agentic CX platform (Customer Service Manager)
Talkdesk CX accelerates patient access with agentic AI (GlobeNewswire)
Syntiant to showcase always-on AI voice solutions (GlobeNewswire)
ElevenLabs & Google dominate Artificial Analysis’ STT benchmark (The Decoder)
DiligenceSquared uses AI to make M&A research affordable (TechCrunch)
3CLogic chosen to enhance ServiceNow-driven managed services (PR Newswire)
AI vocal cloning and the limits of voice-based authentication (BISI)
How large-scale speech models will impact voice AI (Forbes)
Why advanced voice agents require owning the voice stack (Call Centre Helper)
iFLYTEK Globally Launches AI Glasses and AI Interpret Mic (GlobeNewswire)
Meeami Technologies, Alif Semiconductor to demonstrate ultra-efficient edge AI noise suppression (Bluffton Today)
Sensory brings always-on AI speech and biometrics to Snapdragon Wear Elite (Democrat and Chronicle)
Engineering Corner 😎
Spectre I, the first smart device to stop unwanted audio recordings (X)
Google releases WAXAL. This open-access dataset delivers 2,400+ hours of high-quality speech data for 27 Sub-Saharan African languages, serving 100M+ speakers
Introducing KokoClone: Kokoro TTS, but it clones voices now (Reddit)
VietSuperSpeech: A large-scale Vietnamese conversational speech dataset (arXiv)
ZeSTA: Zero-shot TTS augmentation with domain-conditioned training for data-efficient personalized speech synthesis (Takara TLDR)
How to compare latency and accuracy in voice recognition (Goodcall)
FineVoice review: Voice cloning in 30 seconds (Unite.AI)
Improving automatic speech recognition for kids (DrivenData)
Comparing STT algorithms for transcribing survey voice data (Oxford Academic)
Top 10 voice AI agent platforms: Features, pros, cons & comparison (Best DevOps)
Best voice AI for fraud detection workflows (Goodcall)

