x.ai launched Grok voice agent API, Meta launched SAM, and more this week!
Voice AI weekly digest
Krisp is hiring!
Krisp’s SDK team has three key openings: Sr Product Manager, Sr Solution Enginer and BD Manager. If you know exceptional people who might be a great fit, please share these roles with them. Thank you 🙏
Top Updates 💪
x.ai launched Grok voice agent API (xAI)
Meta launched SAM Audio: The First Unified Multimodal Model for Audio Separation (Meta Blog)
Gemini 3 Flash: Frontier intelligence built for speed (Google Blog)
Meta’s AI glasses can now help you hear conversations better (TechCrunch)
Chatterbox Turbo: Model that beats ElevenLabs Turbo & Cartesia Sonic 3 (X)
OpenAI releases new models for its realtime API (The Decoder)
Leading the conversation with conversational AI in Amazon Connect (AWS)
Krisp Accent Conversion SDK for desktop, mobile and server platforms (SpeechTechMag)
Emotii to debut agentic multilingual communication platform (PRWeb)
Speechmatics Startup Program targets robust speech recognition (LinkedIn)
PolyAI raises $86 million as fight to answer calls with AI heats up (Forbes)
AI sound generator startup Mirelo grabs $41M seed round (Tech.eu)
CoeFont launches AI-powered interpreter (GlobeNewswire)
Voximplant and Deepgram bring voice AI to real-world calls (GlobeNewswire)
AI transcription is rewriting journalism and media workflows (Telecom Reseller)
Retell AI targets human bottlenecks in agentic voice AI (SiliconANGLE)
Voice Convert AI: Turn every inquiry into real sales opportunities (TechBullion)
Vonage expands contact center with Salesforce Agentforce voice (MarTechSeries)
NVIDIA Nemotron 3: Hybrid Mamba-Transformer open source models (Smol.ai)
Synthetic voice attacks cost insurers and consumers billions (Insurify)
Elevenlabs launches Lovable integration (Blockchain.News)
The state of customer experience: What 2025 has taught (CXM Today)
AudioCodes deploys Voca CIC voice agent with Atento (PR Newswire)
Resemble AI raises $13 million to tackle deepfake crisis (PYMNTS)
Gnani.ai launches Indic STT model under IndiaAI Mission (Inc42)
Voice Agent: Conversational AI for calls, platforms & tech (Technology.org)
Engineering Corner 😎
Tongyi FUN levels up with major TTS and STT upgrades (X)
POLYN’s ultra-low-power AI voice detection chips (TrendHunter)
7 best AI voice typing and speech-to-text tools (Unite.AI)
Advances in singing voice synthesis (QuantumZeitgeist)
Adapt tone to user sentiment in voice AI and integrate calendar checks (DEV)
Source separation for ASR: How machines learn to untangle mixed audio (DEV)
Automatic speech recognition in a noisy world (DEV)
Reproducing and dissecting denoising language models for ASR (arXiv)




The POLYN ultra-low-power chip mention caught my attention for on-device inference. Power efficiency at the edge is the real constraint for mobile voice AI, not just model size. I've been tracking how NPU integration in phones is shifting workloads away from cloud APIs, and these specialized voice chips could be the next phase once battery life becomes teh bottleneck again.