Sam Altman is right and very wrong about AI-faked voices. And much more!
Voice AI weekly digest
Top Updates 💪
OpenAI CEO Sam Altman is right and very wrong about AI-faked voices (WashingtonPost)
Microsoft is doubling down on multilingual large language models – and Europe stands to benefit the most (ITPro)
Voice AI landscape in Europe 2025 (Sifted)
Freed says 20,000 clinicians are using its medical AI transcription ‘scribe,’ but competition is rising fast (VentureBeat)
Amazon is acquiring Bee, maker of a wearable AI assistant that listens to conversations (Geekwire)
Voice Fraud Prevention in the Age of AI and Hybrid Work (UCToday)
Gupshup raises $60M+ to expand its conversational AI and messaging platform (SiliconAngle)
AI voice company Hyper raises $6.3M to help automate 911 calls (TechCrunch)
Voxtral technical report (X)
Why AI Should Prioritize Conversations Over Automation in Outbound Sales (UniteAI)
AI Voice Assistants in UC: Build, Buy, or Bridge? (UCToday)
Speechmatics shipped realtime speaker diarization for voice agents (X)
How AI speech-to-text technology is tuning in to a digital Saudi Arabia (ArabNews)
Best AI Meeting Notes Assistants for Fintech Teams (Medium)
SayWrite.ai Is Redefining Productivity with AI-Powered Voice Note Taking (Medium)
Amplify Launches Custom AI-Powered Automatic Speech Recognition System (TheJournal)
Hume AI delivers speech models on SambaCloud (SambaNova)
Lightning Captions: Real-Time Transcription and Translation for the Classroom (PRNews)
Leena AI unveils conversational AI ‘colleagues’ for the enterprise (ComputerWorld)
Voice AI Podcast 🎙️
In case you missed the latest episode of Voice AI Podcast…
Voice is becoming a top channel again | Sharang Sharma (Vice President at Everest Group)
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?
Engineering Corner 😎
Introducing Version 2 of Higgs Audio Generation (Boson)
NAR-SREC: Nonautoregressive End-to-End Speech Recognition With Error Correction Decoder
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
Mureka TTS V1: A new voice model from Mureka AI, integrated into their AI music and audio platform
Micdrop v2 Launch: Micdrop, an open-source set of TypeScript packages for building real-time voice conversations with AI agents, launched its v2
macos-local-voice-agents: A new open-source GitHub repository was introduced for running Pipecat voice AI agents locally on macOS
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding.