Updates from Krisp, OpenAI, ServiceNow and much more!

Voice AI weekly digest

Davit Baghdasaryan

May 11, 2026

Top Updates 💪

Krisp launches VIVA 2.0 with Turn Prediction v3 and a first-of-its-kind Interrupt Prediction model, all running on CPU with no transcription required. (Krisp Blog)
OpenAI launches three real-time audio models for its API: GPT-Realtime-2 with GPT-5-class reasoning, GPT-Realtime-Translate for live translation across 70+ languages, and GPT-Realtime-Whisper for streaming speech-to-text. (Reuters)

Twilio unveils a Conversation Layer at SIGNAL 2026 with persistent Memory, Orchestrator, Intelligence, and open-source Agent Connect for plugging in any AI provider. (MarTech)
Inworld ships Realtime TTS-2, a frontier voice model that reads user emotion and tone in real time and adapts pacing, softness, and empathy mid-conversation. (BusinessWire)
ServiceNow unveils Otto, a unified conversational AI layer combining Now Assist, Moveworks, and voice agents across every department and system. (The AI Economy)
SoundHound launches OASYS, a self-learning agentic platform that auto-builds, orchestrates, and improves voice AI agents from documentation and transcripts. (GlobeNewsWire)
ElevenLabs adds BlackRock, NVIDIA, and Jamie Foxx to its $550M+ Series D as annualized revenue crosses $500M, up from $350M at the end of 2025. (TechCrunch)
Greenhouse acquires Ezra AI Labs to bring voice AI interviewing into its ATS as applications per recruiter have spiked over 400% since 2023. (PR Newswire)
Ethos raises $22.75M from a16z for an expert network that onboards 35K people per week through voice AI interviews. (TechCrunch)
8x8 launches AI Studio in early availability, letting teams describe needs in plain language and deploy voice and digital AI agents without adding vendors. (CMSWire)
Wispr Flow bets on India as its fastest-growing market with Hinglish dictation support, 2.5M downloads, and 100% month-over-month growth. (TechCrunch)
ElevenLabs powers SpoonLabs’ audio novels, cutting production time from months to hours and launching PodNovel across Korea, Japan, and Taiwan. (DigitalToday)
eGain launches AI Agent IVA, a knowledge-powered virtual agent that replaces IVR dial trees with natural conversation and 24/7 voice support. (GlobeNewsWire)
Gnani.ai hires eight senior execs after its $10M Series B, processing over 30M voice AI calls daily for 200+ enterprise customers in India. (BusinessToday)
Vobiz.ai raises $1M seed to build AI-native telephony infrastructure in India with DID provisioning, low-latency SIP trunking, and LLM audio streaming. (Tech in Asia)
Twinnin targets $3M seed round for its voice and face cloning marketplace where actors license digital likenesses to studios, backed by Google and NVIDIA. (Deadline)
BCM One partners with TD Synnex to bring Pure IP voice services and SkySwitch UCaaS to the MSP channel through the distributor’s partner network. (CRN)
AI note-taking earbuds go mainstream as Viaim and Mobvoi ship wireless earbuds that record, transcribe, and summarize meetings entirely on-device. (How-To Geek)

Engineering Corner 😎

OpenAI publishes its WebRTC infrastructure playbook, detailing a split relay + transceiver architecture that routes voice AI sessions for 900M+ weekly users at 300-500ms latency. (OpenAI Blog)

TypeWhisper open-sources Mac dictation with 10 ASR engines including WhisperKit, Parakeet, Apple SpeechAnalyzer, Groq, and xAI Grok STT, all running locally. (GitHub)
Dictee ships offline voice dictation for Linux as a KDE Plasma 6 plasmoid with Rust backend, 4 ASR engines, and NVIDIA Parakeet via ONNX Runtime. (GitHub)
TTS models for Indian languages: a dev survey covering Hindi, Tamil, Bengali, and Telugu with architecture comparisons and demo links. (dev.to)
Build a voice agent with LiveKit + AssemblyAI using Universal-3 Pro Streaming STT with function calling and MCP integration. (dev.to)

Voice AI Newsletter

Discussion about this post

Ready for more?