Updates from Salesforce, RingCentral, MS and others!

Voice AI weekly digest

Davit Baghdasaryan

Mar 16, 2026

Top Updates 💪

Salesforce launches Agentforce Contact Centre (CXM World)
RingCentral unveils AIR Pro at Enterprise Connect (CX Today)
Microsoft announces Custom Voice for Dynamics 365 Contact Center (Microsoft)
Intron launches voice AI supporting 57 African languages (Kenyan Wallstreet)
Krisp launches customer accent conversion for global contact centers (CX Today)
Voice and language intelligence market size in 2026 (Precedence Research)
Hume AI appoints new CEO (PR Newswire)
ElevenLabs pledges to restore 1 million voices at SXSW (FindArticles)
AI customer support startup Wonderful AI raises $150 million (Bloomberg)
Devnagri AI launches multilingual enterprise speech AI (MarTech Series)
Spectrum Business and RingCentral expand partnership (Charter Corporate)
CallMiner adds AI classifiers, custom summaries to CX platform (CMSWire)
Sakura adds speech synthesis API to AI platform (Telecompaper)
Agora removes barriers to scalable voice AI agents (Globe Newswire)
ThinkrrAI advances its voice AI strategy (Manila Times)
How voicemail-to-email transcription can create privacy exposure (Paubox)
Outbound AI voice agents in Vodia v70 (Telecom Reseller)
Conversational AI solutions: Benefits, challenges & best practices (Nextiva)
AI ring startup takes on OpenAI And Meta In Wearables (Upstarts Media)
Together AI launches voice agent platform with sub-700ms latency (MEXC)
Sinch unveils Voice Relay to power AI-driven calls (Telco News)
Ex-Apple engineer’s voice-only pendant raises $5M (TechBuzz AI)

Engineering Corner 😎

Hume AI: First open source TTS model, TADA (X)
How developers can bring voice AI into telephony applications (InfoWorld)
This AI can hear, translate, and speak back in 100 languages (Hacker Noon)
KrishokBondhu: A retrieval-augmented voice-based agricultural advisory call center for Bengali farmers (arXiv)
Causal prosody mediation for TTS: Counterfactual training of duration, pitch, and energy in FastSpeech2 (TLDR Takara)
The future of clearer speech is multimodal (Hacker Noon)
Fish Audio S2, a new generation of expressive TTS with controllable emotion (X)
JEPA-v0: Audio encoder for real-time speech translation (StartPinch)
Human brain and AI speech recognition decode speech similarly (TechXplore)
Cybersecurity and forensic audio analysis: Deepfake detection based on MFCC, audio-text disconsistency, and prosodic features (SCIRP)
Voice isolation iPhone guide (Think Design Blog)
Gemini embedding 2: Natively multimodal embedding model (Google Blog)
Building a TTS engine in pure C (Dev)

Discussion about this post

No posts

Ready for more?

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts