Top Updates 💪
Apple acquires AI startup Q.ai reported $2B (SiliconANGLE)
Movate and Krisp partner on AI voice solutions for CX (Business Standard)
Agora collaborates with Microsoft Azure AI to enable a real-time, intelligent, and interactive future across 140+ languages (Microsoft)
Voice AI is booming but without CX observability, it will break (CX Today)
Retell AI upgrades voice platform; revenue tops $40M ARR (Markets Insider)
Google to pay $68M over voice assistant eavesdropping claims (CBS News)
Moving to BPO-hosted voice AI? Risks & path forward (Contact Center Pipeline)
Five Guys extends partnership with SoundHound AI (The Globe and Mail)
Boldvoice raises $21M for AI voice coaching (PR Newswire)
These California companies want you to ditch your keyboard (Los Angeles Times)
CommBox unveils Era AI Voice to transform call centres (ITBrief)
Germany’s largest grocery retailer turns to LYDIA Voice (Pressebox)
Telus & RingCentral expand business connect with AI features (Telecom Reseller)
How RingCentral’s agentic AI unifies the experience (RingCentral Blog)
Synthesia raises $200M at $4B valuation for AI avatars (SiliconANGLE)
CyberloQ and IngenID partner to add voice biometrics and deepfake detection to location-based MFA (IDTechWire)
AI-Media to showcase real-time translation and accessibility workflows at ISE 2026 (GlobeNewswire)
Voice AI: Come to the dark side (No Jitter)
Engineering Corner 😎
Qwen3-ASR & Qwen3-ForcedAligner is now open sourced (Qwen)
Qwen3 TTS and the case for token-based speech synthesis (HackerNoon)
Running TTS fully in the browser with PocketTTS (DEV)
VIBEVOICE-ASR technical report (arXiv)
SpatialEmb: Extract and encode spatial information for 1-stage multi-channel multi-speaker ASR on arbitrary microphone arrays (arXiv)
A ground-truth-free framework for validating emotions in generative AI speech synthesis (IEEE Xplore)
A wireless, battery-free artificial throat patch with deep learning for emotional speech recognition (Wiley Online Library)


Solid roundup. The Agora x Azure collarboration on multilingual realtime voice caught my eye because cross-language latency is still a huge problem in practise. Most teams underestimate how much lag kills conversational flow even with decent transcription accuracy. The fact they're targeting 140+ languages is ambitious but actual deployment quality usualy varies alot across language pairs.