Voice AI's Consolidation Begins

Voice AI weekly digest

Davit Baghdasaryan

Apr 27, 2026

Two important Voice AI events in the coming weeks:

Twilio Signal - May 6-7 in SF
Cerebral Valley Voice Summit - May 6 in SF

Top Updates 💪

xAI launches Grok Voice Think Fast 1.0, ranking #1 on the tau-voice Bench for full-duplex voice agents and already powering Starlink support with a 20% sales conversion rate. (xAI)
Anker unveils THUS, the first compute-in-memory AI audio chip, claiming 150x more on-device AI power for noise cancellation in its upcoming Soundcore earbuds. (The Verge)
SoundHound acquires LivePerson for $43M, combining voice agentic AI with LivePerson’s digital messaging platform that handles one billion customer messages per month. (GlobeNewswire)
Krisp Voice AI SDK won double Webby Awards for Technical Achievement (LinkedIn)
Speechmatics delivers on-device STT for Adobe Premiere, transcribing an hour of video in 55 seconds offline with accuracy within 5% of cloud. (TV Technology)
Nothing launches Essential Voice, an AI dictation tool that cleans filler words and formats speech-to-text system-wide in 100+ languages. (TechCrunch)
Synthflow AI and 8x8 partner to embed no-code voice AI agents directly into the 8x8 Contact Center platform across 30+ languages. (VentureBeat)
Google Meet AI note-taking now works for in-person meetings, generating transcripts, summaries, and action items from face-to-face conversations via mobile. (Lifehacker)
Xiaomi releases MiMo v2.5 TTS and open-sources MiMo v2.5 ASR, a full voice pipeline with voice cloning, voice design, and dialect-aware recognition for the agent era. (Gizmochina)
Volkswagen will ship voice AI in all China-built cars starting H2 2026, using on-device LLMs from Tencent, Alibaba, and Baidu. (CNBC)
Newo appoints new CEO after $25M Series A to scale partner-led voice AI infrastructure for MSPs, VoIP providers, and software platforms serving SMBs. (GlobeNewswire)
Ericsson embeds AI calling and fraud detection into IMS, partnering with Hiya for real-time spam blocking as 86% of unknown calls go unanswered. (Ericsson Blog)

Engineering Corner 😎

Streaming TTS models fail over 60% of sentences containing phone numbers, dates, and prices due to 5-20x less context than batch mode. (Technology.org)

AI neck sensor turns silent speech into voice by reading microscopic throat muscle movements with a CNN+transformer pipeline from POSTECH. (Digital Trends)
AWS guide to cost-effective multilingual transcription at scale using NVIDIA Parakeet TDT and AWS Batch. (AWS Blog)
Ghost Pepper: open-source browser extension for real-time voice transcription and LLM-powered responses. (GitHub)
Mimi Codec deep-dive on its layered audio compression design for neural speech coding. (LetsDDataScience)
AssemblyAI showcases configurable STT with tunable turn-taking, medical mode for streaming, and real-time speaker labeling. (TipRanks)

Voice AI Newsletter

Discussion about this post

Ready for more?