Top Updates 💪
Bringing state-of-the-art Gemini translation capabilities to Google Translate (Google Blog)
Improving Gemini TTS models for better control and capabilities (Google Blog)
Washington Post uses AI to create personalized podcasts (NPR)
Meta acquires wearable AI startup Limitless (AiBusiness)
How voice AI is changing the way restaurants handle phone reservations (Restaurant Technology News)
Voice agents that handle real phone calls through Twilio (X)
Google Pixel 10 Pro adds AI voice typing with Gemini & Tensor G5 (WebProNews)
Fluid and expressive conversations when you go Live with Search (Google Blog)
Meta partners with ElevenLabs to power AI audio across Instagram, Horizon (Economic Times)
Pine raises $25M Series A (Crowdfund Insider)
Resemble AI raises $13M for new approach to deepfake detection (SiliconANGLE)
Observe AI named a leader in the IDC MarketScape (GlobeNewswire)
The AI voice shortcut that unlocks serious productivity gains (Forbes)
How AI voice agents are revolutionizing business communication (TechBullion)
Pebble is making a weird little smart ring for recording thoughts (Engadget)
Lorikeet Voice 2.0 powers support during SNAP emergency rollout (PR Newswire)
SoundHound AI enables in-vehicle reservations with OpenTable (GlobeNewswire)
Imper.ai launches with $28M to stop impersonation attacks (SiliconANGLE)
Neosapience raises $11.5M for “emotionally intelligent” AI voice tech (Deadline)
TicNote Pods: The world’s first 4G AI note-taking earbuds (PR Newswire)
Engineering Corner 😎
Low-latency cascaded conversational agent in MLX (Apple Machine Learning)
AI headphones with smart noise cancellation and proactive listening (University of Washington News)
How to use Gemini Live API for native audio in Vertex AI (Google Cloud Blog)
Optimizing deep neural networks for EEG-based speech recognition: A multimodal approach to assistive communication (IEEE Xplore)
Audio Note offers accurate live transcription from mic, apps & files (TrendHunter)
Boost CSAT with VAD, backchanneling and sentiment routing (DEV)
Low-resource ASR by fine-tuning Whisper with Optuna-LoRA (MDPI)
Neuphonic & Google Cloud: Low-latency TTS (YouTube)
N-Gram and RNN-LM language model integration for end-to-end Amazigh speech recognition (MDPI)
Alibaba announces an upgraded version of ‘Qwen3-Omni-Flash’ (Gigazine)
Controlling digital head avatars via audio signals (TechXplore)
Google AI glasses launch in 2026 (CNBC)
Nvidia Broadcast mic quality (MakeUseOf)


Really solid roundup of what's happening in voice AI right now. The Gemini TTS improvements particularly caught my attention becasue better control over synthetic speech output could finally make these systems viable for customer-facing apps where tone matters. I've been testing similar models in a production enviroment lately, and the gap between "technically impressive" and "actually usable" is stil wider than most demos suggest.