Top Updates 💪
Google Fi to add AI noise cancellation (TechCrunch)
OpenAI expands into music and speech translation (Digital Information World)
WellSaid boosts AI speech with faster, natural voice production (SiliconANGLE)
Otter.ai recasts itself beyond meeting notes (FindArticles)
Microsoft copilot gets 12 big updates for fall (VentureBeat)
Aurigin.ai integrates audio deepfake detection with India’s misinformation combat alliance (ID TechWire)
Contact center economics in the age of voice AI: An inside look (CX Today)
NVIDIA, Microsoft, ElevenLabs top new ASR leaderboard (Slator)
AI-enabled voice systems move from hype to measurable impact (NoJitter)
Vogent released Vogent-Turn-80M turn-detection model (LinkedIn)
Voice AI and the future of human-computer interaction (ExpressComputer)
Whryte 4x faster than typing: Offline STT app for Mac (Geeky-Gadgets)
Zendesk AI Summit unveils resolution platform and Voice AI agents (Info-Tech)
Sesame raises $250M and launches beta (TechCrunch)
Dialpad expands into autonomous CX with agentic AI platform launch (Info-Tech)
Strong customer adoption of Webex AI Agent for contact center (Webex Blog)
Contact center economics in the age of voice AI: An inside look (CX Today)
Blobfish AI: Interview with CEO about the conversational voice company (Pulse2)
AI translation glasses make their academic debut (KoreaBizWire)
Our Latest Article
Africa’s Voice AI Moment
When people talk about AI and automation, they often forget where customer experience truly happens — in conversation.
Engineering Corner 😎
Fish Audio S1, the most expressive and natural TTS model on the market (X)
Neural audio codecs: How to get audio into LLMs (Kyutai)
MiMo-Audio — Xiaomi-MiMo-Audio demo (Xiaomi MiMo Audio Demo)
Summarizing speech: A comprehensive survey (arXiv)
Accurate semi-supervised ASR for ordinary and characterized speeches via multi-hypotheses-based curriculum learning (PLOS ONE)
Lightweight end-to-end diacritical Arabic speech recognition using CTC-transformer with relative positional encoding (MDPI)
Deep learning for inner speech recognition: a pilot comparative study of EEGNet and a spectro-temporal Transformer on bimodal EEG-fMRI data (Frontiers in Human Neuroscience)
Building a scalable high-accuracy audio to text system handling noise crosstalk and grammar (Medium)


