TTS Arena V2 (Hugging Face)
Top Updates 💪
Nvidia open sources transcription model Parakeet-TDT-0.6B-V2 (VentureBeat)
Parloa raises $120M at $1B valuation to expand AI agent platform (SiliconANGLE)
Decagon in talks to raise $100 million at a $1.5 billion valuation (Forbes)
SoundCloud changes policies to allow AI training on user content (TechCrunch)
Sarvam AI launches TTS model with support for 11 Indian languages (The Hindu)
RNR Technologies raises $4M for voice and face AI in India (The Hindu)
OpenAI backs Vahan’s voice AI for hiring in India (TechStory)
RingCentral unveils RingCX for Salesforce Service Cloud Voice (Telecom Reseller)
NUGEN Audio unveils DialogCheck for speech intelligibility (TVNewsCheck)
The role of conversational AI agents in APAC contact centers (CX Network)
ElevenLabs introduces open-source Audio Starter Kit (LatestLY)
Ethiopia's Hasab AI launches voice platform for African languages (Addis Insight)
Infinix AI Buds break language barriers with real-time translation (TrendHunter)
AI voice cloning and dubbing: breaking language barriers globally (Raindance)
AI-powered headphones offer group translation with voice cloning (Tech Xplore)
Voice AI Podcast 🎙️
In case you missed the latest episode of Voice AI Podcast…
Building and Scaling Agent Copilot | Ruma Nair (Principal Product Manager at Twilio)
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?
Engineering Corner 😎
A new AI translation system for headphones clones multiple voices simultaneously (MIT Technology Review)
LLaMA-Omni2: LLM-based real-time spoken chatbot with autoregressive streaming speech synthesis (Papers with Code)
Open ASR leaderboard (Hugging Face)
Large language model alignment score for automatic TTS systems (GitHub Pages)
Haitian Creole ASR with limited labeled data (GitHub)
Transcribing audio files with OpenAI in Spring AI (Baeldung)
Speech recognition API for voice input (DEV Community)
Everything about voice AI agents in 2025 (DEV Community)