Top Updates 💪
After Babel Fish: The promise of cheap translations (The Hedgehog Review)
This new AI voice trainer can help you learn a new language (ZDNet)
Microsoft boosts contact center voice AI with new speech recognition (CX Today)
Amazon starts offering customer service in sign language (CX Today)
Voice cloning meets emotional speech synthesis in Alibaba’s Marco model (Slator)
Speechmatics sets record in medical STT with 93% accuracy (Morningstar)
Zoom introduces cross platform AI notetaker in latest update (Business Standard)
Xiaomi released its wildly human-like AI voice model (XiaomiTime)
Xiaohongshu (RedNote) has shared an open TTS - FireRedTTS 2 on the hub (X)
Voice AI startup Keplar aims to replace traditional market research (TechCrunch)
Puzzel: Turning data into decisions with conversational AI (CX Today)
AudioCodes expands Voice CPaaS offering with AI Agents (PR Newswire)
How Deepgram is perfecting the future of customer service (Crowdfund Insider)
Communicate with confidence in 41 languages with Mondly (Entrepreneur)
Alberta Energy Regulator to implement 3CLogics voice AI and contact center platform for ServiceNow IT service management (PR Newswire)
ShopAi announces £750,000 raise (Retail Technology Innovation Hub)
Clarity raises $12m in funding (FinSMEs)
Spara raises $15m in seed funding (FinSMEs)
Studio 3.0 by ElevenLabs: Advanced AI audio editor (Blockchain.News)
Neurotechnology launches multilingual NLP SDK with Baltic support (ID Tech)
Cerence AI & DSP Concepts collaborate to revolutionize in-car audio (StockTitan)
Voice AI Podcast 🎙️
In case you missed the latest episode of Voice AI Podcast…
End-to-end integrated Voice AI | Neil Hammerton (CEO & Co-Founder, Natterbox)
In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?
Engineering Corner 😎
Reka Speech: An efficient and accurate transcription & translation model (X)
VoxCPM: Tokenizer-free TTS for context-aware speech generation and true-to-life voice cloning (Hugging Face)
Conversational AI features, real-world examples and how it works (TechBullion)
Data-independent beamforming for end-to-end multichannel multi-speaker speech recognition (arXiv)
AI meeting assistant using Kiro AI (DEV)
StreamMel: Real-time zero-shot TTS via interleaved continuous autoregressive modeling (IEEE Xplore)
AlterEgo: A non-invasive wearable device for silent speech recognition (Artificial Intelligence in Plain English)
Robust speaker recognition using perceptual stationary wavelet coefficients and prosodic feature in noisy conditions (IEEE XPlore)
Phaseper: A complex-valued transformer for ASR (IEEE XPlore)
Preservation of language understanding capabilities in speech-aware large language models (arXiv)
Audio post-production for video: Enhancing visuals with sound (Gearnews)
Predicting dementia through audio: Ensemble and deep learning approaches using acoustic features (ScienceDirect)