Soniox launches v5, Bland raises $50M, Mistral ships Voxtral Transcribe 2 and more

Voice AI weekly digest

Davit Baghdasaryan

Jun 22, 2026

Events

Voice AI Meetup Madrid is a small gathering hosted by Deepgram, AWS, and Pipecat for founders and engineers building with voice AI in Spain (Jun 23, Madrid | Pipecat)
Boba-thon is a hands-on AI build night by AI Valley × Workato - teams form, prototype AI workflows, and demo by end of night. (Jun 25, San Francisco | Voice AI Space)
UK & Ireland Speech Workshop brings together speech science researchers and industry builders around advances in healthcare speech tech. (Jun 22-24, London | UKIS2026)

Top Updates 💪

Bland raises $50M Series C led by Dell Technologies, bringing its total funding past $100M. (Fortune)
Soniox launches v5 Real-Time and Async, a speech model that turns live conversations into structured, speaker-aware intelligence. (Soniox)

Google launches a $99 Gemini-powered Home Speaker, its first standalone smart speaker since the Nest Audio in 2020. (TechCrunch)
Plaud crosses $100M ARR in two years, making it the fastest hardware-led AI company to hit that milestone. (ITBrief)
Respond.io raises $62.5M Series B to expand its AI-powered customer messaging platform into North America and Europe. (MarTech Series)
Poland invests $11M in ElevenLabs and launches AI Lab Poland to grow its national AI ecosystem. (Mezha)
Mistral ships Voxtral Transcribe 2, an open-source on-device ASR model with batch transcription at $0.003 per minute. (Mistral)
Gnani AI launches Prisma v2.5, ranking first in 8 of 9 Indian language ASR benchmarks against Sarvam and ElevenLabs. (MediaNama)
Tencent Cloud and Inworld AI partner to integrate sub-130ms TTS into Tencent’s real-time communication infrastructure. (PR Newswire Asia)
Tencent Cloud and Soniox partner to bring multilingual speech-to-text across 200+ countries via Tencent RTC. (FutureCIO)
DeepL acquires Mixhalo’s ultra-low-latency audio team and technology to scale its real-time voice translation product. (PR Newswire)
TELUS Digital and Cresta partner to deliver AI agents alongside human agents in enterprise contact centers. (PR Newswire)
Parloa becomes the first agentic AI provider on Alvaria’s outbound platform, targeting regulated industries. (PR Newswire)
LiveKit Inference now defaults to zero data retention, meaning prompts and audio are never stored by any model provider. (LiveKit)
AI fraud cost $442B globally in 2025 as voice clones now fool even experts, per an INTERPOL report. (TechTimes)
UC study finds vocal similarity alone drives persuasion, with listeners complying more when a speaker’s voice matches theirs. (UC News)
AI voice clones are up to 20% more intelligible than real humans in noisy environments, a new JASA study shows. (PsyPost)
India’s telecom layer needs rebuilding for voice AI to scale, with traditional infrastructure adding 300-500ms of latency. (Inc42)
Multilingual voice AI is India’s next big opportunity, with 600M+ vernacular users driving enterprise demand. (Express Computer)

Engineering Corner 😎

TowardsAI tutorial on using Gemini streaming TTS to make voice apps feel instant. (Towards AI)

Dev.to walkthrough of building a voice AI platform with 28 modules in Python. (Dev.to)
CTO field report on testing 184 AI text-to-speech models across quality, latency, and cost. (Dev.to)
Dev.to tutorial on simple text-to-speech in Python using PythonAIBrain. (Dev.to)

Voice AI Newsletter

Discussion about this post

Ready for more?