Top Updates đȘ
Mistral launches Voxtral TTS - Open-weight 4B TTS model. 9 languages, 90ms TTFA, 6x RTF. Runs on consumer GPUs. Mistral claims it beats ElevenLabs on quality benchmarks. (TechCrunch) (Mistral blog)
Cohere releases Transcribe - Open-source 2B ASR model built for edge. 14 languages, 5.42 avg WER on HF Open ASR leaderboard, beating Zoom Scribe v1, IBM Granite 4.0, ElevenLabs Scribe v2, and Qwen3-ASR. Free via API and HuggingFace. (TechCrunch) (Cohere blog)
Google ships Gemini 3.1 Flash Live + Search Live goes global - Real-time voice/video model with native function calling. 90.8% on ComplexFuncBench Audio (~20% jump over prev gen). Now powers Search Live in 200+ countries with voice and camera input. (Google blog) (TechCrunch)
Smallest AI launches Lightning V3 - 3.89 MOS in conversational evals, claims to beat OpenAI, Cartesia, and ElevenLabs. 15 languages with auto-detection and mid-sentence switching. Voice cloning from 5-15s of audio. (Smallest.ai blog)
Amazon Polly adds Bidirectional Streaming - Stream text to Polly token-by-token as your LLM generates it, get audio back in real time over HTTP/2. 39% faster than batch approach, collapses 27 API calls to 1 on a 970-word passage. GA now. (AWS blog)
AWS adds WebRTC to Bedrock AgentCore - Pipecat voice agents now run on AgentCore Runtime with bidirectional WebSocket and WebRTC. Supports barge-in. Ready-to-deploy examples with Pipecat, Nova Sonic, LiveKit, and Strands SDK. (AWS blog)
Genesys reports record Q4 - Genesys Cloud at ~$2.6B ARR, 35%+ YoY growth. 70%+ of customers now on AI. AI-powered conversations up 120% YoY. AI is 20% of new ACV, with 10+ deals where AI exceeded half the contract value. (Genesys)
Artificial Analysis updates voice benchmarks - AA-WER v2.0 adds conversational AI, EU Parliament speech, and financial call datasets. ElevenLabs Scribe v2 leads at 2.3% WER. Best value: Mistral Voxtral Small at 3.0% WER / $4 per 1K min. TTS Arena: Inworld TTS-1.5-Max at #1, ELO 1,160. (X post)
AI chatbots handle 60%+ of banking support - BofA Erica: 1.5B+ interactions, 98% resolved without human. Klarna AI: 66% of inquiries, saving $40M/yr. Gartner projects $80B in contact center labor cost cuts in 2026. (TechBullion)
The economics of AI vs human agents - Voice AI now costs ~$0.40/call vs $7-12 for a human agent: 90-95% cost reduction per interaction. Analysis of how this is reshaping contact center staffing. (Medium)
Agentic Voice AI goes mainstream - 1 in 10 customer service interactions projected to be fully automated by agentic voice AI in 2026. 80% of businesses plan to deploy. RingCentral shipped AIR Pro, an agentic voice platform embedded in its comms stack. (Telecom Reseller)
Salesforce Agentforce Contact Center - Native CCaaS unifying voice, digital channels, CRM, and AI agents in one stack. Voice now built into the CRM on Hyperforce. GA since Feb 23. (Cloud Wars)
Otter.ai hits 35M users, $100M ARR - Sam Liang interview. $100M ARR with <200 employees ($500K+ rev/employee). #14 on Forbes 2026 Best Startup Employers. Liang: 2026 is âthe year of the voice.â (YouTube)
Engineering Corner đ
Gladia open-sources WER normalization library - Normalizes transcripts before computing WER to eliminate false penalties from formatting differences (â$50â vs âfifty dollarsâ). Configurable YAML pipelines for fair cross-engine ASR comparison. (GitHub) (LinkedIn - Gevorg Minasyan)
MacWhisper - Mac-native local transcription using Whisper and Nvidia Parakeet. 300K copies sold. Batch processing, YouTube transcription, auto-recording Zoom/Teams/Webex. All on-device. (Trend Hunter)
Logan Kilpatrick on Gemini 3 Flash - Google DeepMindâs Logan Kilpatrick discusses the latest Gemini model capabilities. (X post)
Google Docs adds Gemini-powered audio proofreading - âListen to thisâ reads docs aloud with AI voices. 0.5x-2x playback. Also ships audio summaries: condenses long docs into ~3min podcast-style recaps. Desktop, English only for now. (MakeUseOf)
Rekam AI - All-in-one voice platform: TTS, STT, voice cloning, custom voice creation. 2,000+ voices, 20+ languages. Free unlimited tier for Kokoro models. (Dynamic Business)
Klassifier - AI-powered audio classification tool. (Trend Hunter)
ViciStack on call center AI voice agents - Overview of real-time conversation handling, reduced wait times, and automated workflows in production contact centers. (ViciStack)

