Top Updates 💪
Claude's AI voice mode is finally rolling out - for free (ZDNet)
Hume launches voice model EVI-3 with rapid custom voice creation (VentureBeat)
ElevenLabs debuts Conversational AI 2.0 (VentureBeat)
IndiaAI selects SoketAI, GnaniAI, and GanAI to build next-gen LLMs (The Bridge Chronical)
AI models using AirPods audio may detect heart rate (AppleInsider)
How real-time translation is and isn’t changing video conferencing (UC Today)
Strong Tie boosts customer experience with Liberate's voice AI (PR Newswire)
SoundHound and Allina launch AI agent for patient engagement (HIT Consultant)
Modulate brings live voice fraud & harm detection to Twilio users (Modulate)
ScotRail uses AI voice clone without speaker's consent (BBC)
The limits to using Siri in healthcare (Paubox)
Siro raises $50M Series B for AI Conversation Intelligence Platform (Pulse)
Rime secures $5.5M in seed funding (Rime.ai)
Invoca acquires Symbl, a Seattle startup that uses conversational AI (GeekWire)
Blobfish AI enables voice AI training for call centers (Trend Hunter)
Gemini TTS: The future of human-like audio content (Geeky Gadgets)
Soundcore Liberty 5: revolutionizing voice-canceling with adaptive ANC 3.0 and Dolby Audio (Bastille Post)
The power of audio-to-text technology (TechBullion)
Voice AI Podcast 🎙️
In case you missed the latest episode of Voice AI Podcast…
Building Voice AI at 11x | Francisco Izaguirre (Engineering Lead at 11x)
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?
On X
Engineering Corner 😎
Motivational speech synthesis (YouTube)
SpeakStream: streaming text-to-speech with interleaved data (Apple Machine Learning Research)
Speech translation from Darija to classical Arabic: Performance analysis of Whisper, SeamlessM4T, and S2T models (IEEE Xplore)
mWhisper-Flamingo for multilingual audio-visual noise-robust speech recognition (IEEE Xplore)
Overlap-adaptive hybrid speaker diarization and ASR-aware observation addition for MISP 2025 challenge (arXiv)
Building a real-time audio transcription system with OpenAI’s realtime API (DZone)
200+ custom system prompts for voice-to-text post-processing (DEV Community)
10 TTS APIs that give voice to AI (Nordic APIs)
Maximize your content's reach with CapCut's TTS technology (FinSMEs)
TTS-o-matic: Local AI TTS tool for Unity (Unity Asset Store)
Timekettle W4 Pro translation earbuds leave competitors behind (Yanko Design)