<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Voice AI Newsletter]]></title><description><![CDATA[Voice AI insights from Krisp's CEO]]></description><link>https://voice-ai-newsletter.krisp.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!YLgs!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F831a2f7e-d0a7-4e3d-87a8-c42c65d0b71c_1000x1000.png</url><title>Voice AI Newsletter</title><link>https://voice-ai-newsletter.krisp.ai</link></image><generator>Substack</generator><lastBuildDate>Wed, 01 Jul 2026 10:00:06 GMT</lastBuildDate><atom:link href="https://voice-ai-newsletter.krisp.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Krisp Technologies]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[krispai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[krispai@substack.com]]></itunes:email><itunes:name><![CDATA[Davit Baghdasaryan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Davit Baghdasaryan]]></itunes:author><googleplay:owner><![CDATA[krispai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[krispai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Davit Baghdasaryan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[$200M Pours Into Voice AI, OpenAI Bidi-1 Leaks]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/200m-pours-into-voice-ai-openai-bidi</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/200m-pours-into-voice-ai-openai-bidi</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 29 Jun 2026 14:03:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e90060d3-0692-46ad-84f2-adf95038b653_1670x928.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Events </h2><ul><li><p><strong>AI Engineer World&#8217;s Fair</strong> is a flagship AI engineering conference with a dedicated Voice &amp; Realtime AI miniconference featured this year (Jun 29-Jul 2, SF | <a href="https://www.ai.engineer/worldsfair/2026">AI Engineer</a>)</p></li><li><p><strong>Low Latency Lounge by Deepgram</strong> is an invite only evening for engineers building the fastest AI in the stack. Together AI and Runware are cohosting (Jun 30, SF | <a href="https://luma.com/low-latency-lounge">LUMA</a>)</p></li><li><p><strong>Real-Time Voice AI &#215; Device Builders Meetup</strong> &#8220;Give Voice to Robots!&#8221; Runs alongside IVS Kyoto (Jul 2, Kyoto | <a href="https://www.voiceaispace.com/events/-real-time-voice-ai--device-builders-meetup-kyoto">Voice AI Space</a>)</p></li></ul><h2>Top Updates &#128170;</h2><ul><li><p><strong>AssemblyAI launches</strong> Universal-3.5 Pro Realtime, the first streaming STT model that takes the agent&#8217;s question as input (<a href="https://www.assemblyai.com/blog/universal-3-5-pro-realtime">AssemblyAI Blog</a>)</p></li><li><p><strong>Five9 launches</strong> Voice AI Agents and AI Agent Studio at CCW, bringing agentic CX to enterprise contact centers. (<a href="https://www.cxtoday.com/ai-automation-in-cx/five9-voice-ai-agents-agentic-cx-launch/">CX Today</a>)</p></li><li><p><strong>Krisp launches</strong> Voice Security for deepfake detection and fraud detection for contact centers. (<a href="https://www.cxtoday.com/security-privacy-compliance/krisp-expands-contact-center-ai-platform-with-voice-security-and-speech-analytics/">CX Today</a>)</p></li><li><p><strong>CallMiner launches</strong> real-time AI guidance that lets contact center agents initiate AI assistance on demand with human-in-the-loop controls. (<a href="https://www.businesswire.com/news/home/20260622825087/en/CallMiner-Enhances-Real-Time-Agent-Performance-and-Customer-Experience-with-New-AI-Capabilities">BusinessWire</a>)</p></li><li><p><strong>Assort Health raises</strong> $120M Series C led by Menlo Ventures at a $1.2B valuation to scale its voice AI agent platform across healthcare. (<a href="https://www.fiercehealthcare.com/ai-and-machine-learning/assort-health-scores-120m-series-c-scale-voice-ai-agent-platform-healthcare">Fierce Healthcare</a>)</p></li><li><p><strong>Prosper AI raises</strong> $30M Series A led by a16z to scale its autonomous patient journey platform, reporting 5x revenue growth in six months. (<a href="https://hackernoon.com/prosper-ai-raises-$30m-led-by-a16z-to-scale-autonomous-patient-journey-platform">HackerNoon</a>)</p></li><li><p><strong>Coval raises</strong> $28M Series A led by Norwest to advance its voice AI evaluation and testing platform, founded by an ex-Waymo engineer. (<a href="https://pulse2.com/coval-raises-28-million-series-a-to-advance-voice-ai-evaluation-platform/">Pulse2</a>)</p></li><li><p><strong>Kotoba Technologies raises</strong> $10M seed led by Kindred Ventures for its real-time East Asian voice translation platform with sub-2s latency. (<a href="https://gamesbeat.com/kotoba-technologies-raises-10m-for-real-time-voice-ai-platform-in-east-asia/">VentureBeat</a>)</p></li><li><p><strong>Valence AI raises</strong> $5M seed and secures US patents on real-time emotional detection from live speech. (<a href="https://www.prnewswire.com/news-releases/valence-ai-raises-5-million-secures-us-patents-on-real-time-emotional-detection-from-live-speech-302808293.html">PR Newswire</a>)</p></li><li><p><strong>TELUS Digital partners</strong> with ElevenLabs as a preferred implementation partner to scale voice AI alongside frontline customer care teams. (<a href="https://www.prnewswire.com/news-releases/telus-digital-and-elevenlabs-partner-to-scale-voice-ai-alongside-frontline-customer-care-teams-882149628.html">PR Newswire</a>)</p></li><li><p><strong>OpenAI&#8217;s GPT-Bidi-1</strong> leaks as a full-duplex voice model that can listen and speak simultaneously, enabling true bidirectional conversation. (<a href="https://cryptobriefing.com/openai-chatgpt-bidi-1-voice-model/">Crypto Briefing</a>)</p></li><li><p><strong>Conduent unveils</strong> a next-gen CX platform with real-time translation across 90+ languages to accelerate agent performance. (<a href="https://www.news.conduent.com/news/conduent-introduces-ai-powered-next-generation-cx-platform-to-expand-global-customer-reach-and-accelerate-agent-performance">Conduent</a>)</p></li><li><p><strong>Speechify brings</strong> free voice typing to all iPhone and Mac users, adding AI-powered dictation across every app. (<a href="https://9to5mac.com/2026/06/23/speechify-brings-voice-typing-to-all-iphone-and-mac-users/">9to5Mac</a>)</p></li><li><p><strong>Modulate launches</strong> an AI music detection API with 95% precision across 76 genres to help platforms verify AI-generated music. (<a href="https://www.morningstar.com/news/accesswire/1181688msn/modulate-launches-ai-music-detection-api-to-help-platforms-verify-ai-generated-music-at-scale">Morningstar</a>)</p></li><li><p><strong>ByteDance releases</strong> Seed Audio 1.0, a unified model that generates speech, music, and ambient sound from a single architecture. (<a href="https://www.citybuzz.co/2026/06/25/seed-audio-1-0-launches-unified-ai-audio-generation-for-speech-music-and-ambient-sound/">CityBuzz</a>)</p></li><li><p><strong>Amazon launches</strong> Alexa Plus Hindi beta in India, targeting 600M+ Hindi speakers with its upgraded AI assistant. (<a href="https://thenextweb.com/news/amazon-alexa-plus-india-hindi-beta-testing">The Next Web</a>)</p></li><li><p><strong>ElevenLabs adopts</strong> Google&#8217;s SynthID watermarking to tag all AI-generated speech, making synthetic voices easier to detect. (<a href="https://www.digitaltrends.com/cool-tech/ai-voices-are-getting-harder-to-spot-this-elevenlabs-feature-could-change-that/">Digital Trends</a>)</p></li><li><p><strong>Shure says</strong> audio quality is now the critical bottleneck for AI-powered meetings, and microphone clarity drives everything. (<a href="https://www.inavateonthenet.net/features/article/shure-says-audio-is-now-critical-to-ai-meetings--and-clarity-is-everything">InAVate</a>)</p></li><li><p><strong>Attention Labs launches</strong> SAA, a selective auditory attention layer that lets voice AI detect when it is being directly addressed. (<a href="https://www.dispatch.com/press-release/story/203592/attention-labs-launches-saa-the-engagement-control-layer-that-lets-voice-ai-know-when-it-is-being-addressed/">Dispatch</a>)</p></li><li><p><strong>Deepgram and Fortanix</strong> partner to run voice AI on-premises with NVIDIA confidential computing, keeping audio data encrypted during processing. (<a href="https://radioinfo.com.au/audioinfo/technology-news/private-voice-ai-introduced-to-better-protect-your-audio/">RadioInfo</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>Gradium releases</strong> STT-Translate and S2S-Translate, real-time speech translation models that beat GPT Realtime Translate on accuracy and latency. (<a href="https://www.marktechpost.com/2026/06/24/gradium-launches-stt-translate-and-s2s-translate-real-time-speech-translation-models-beating-gpt-realtime-translate-on-accuracy-and-latency/">MarkTechPost</a>)</p></li></ul><ul><li><p><strong>AWS publishes</strong> a full tutorial on building a healthcare appointment agent with Amazon Nova 2 Sonic and Bedrock AgentCore. (<a href="https://aws.amazon.com/blogs/machine-learning/build-a-healthcare-appointment-agent-with-amazon-nova-2-sonic/">AWS Blog</a>)</p></li><li><p><strong>AssemblyAI shares</strong> four techniques for prompting Claude to build production-ready voice agents in about 30 seconds. (<a href="https://www.assemblyai.com/blog/prompting-claude-build-voice-agents">AssemblyAI Blog</a>)</p></li><li><p><strong>Deepgram discusses</strong> voice AI infrastructure and the path to production-grade agents on the Telecom Reseller podcast. (<a href="https://telecomreseller.com/2026/06/24/deepgram-on-voice-ai-infrastructure-and-the-road-to-production-grade-agents-podcast/">Telecom Reseller</a>)</p></li><li><p><strong>ACL 2026 publishes</strong> 10 voice AI papers covering noise-robust ASR, accented speech recognition, environment-aware TTS, controllable speech synthesis, multi-speaker diarization, and multilingual translation. (<a href="https://aclanthology.org/volumes/2026.acl-long/">ACL Anthology</a>)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Soniox launches v5, Bland raises $50M, Mistral ships Voxtral Transcribe 2 and more]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/bland-raises-50m-soniox-launches</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/bland-raises-50m-soniox-launches</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 22 Jun 2026 14:02:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1b7f88f1-7c0e-4241-afaf-7a4cb9b5dace_1298x828.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Events </h2><ul><li><p><strong>Voice AI Meetup Madrid</strong> is a small gathering hosted by Deepgram, AWS, and Pipecat for founders and engineers building with voice AI in Spain (Jun 23, Madrid | <a href="https://www.pipecat.ai/events">Pipecat</a>)</p></li><li><p><strong>Boba-thon</strong> is a hands-on AI build night by AI Valley &#215; Workato - teams form, prototype AI workflows, and demo by end of night. (Jun 25, San Francisco | <a href="https://www.voiceaispace.com/events/boba-thon">Voice AI Space</a>)</p></li><li><p><strong>UK &amp; Ireland Speech Workshop</strong> brings together speech science researchers and industry builders around advances in healthcare speech tech. (Jun 22-24, London | <a href="https://sites.google.com/view/ukis2026/home">UKIS2026</a>)</p></li></ul><h2>Top Updates &#128170;</h2><ul><li><p><strong>Bland raises</strong> $50M Series C led by Dell Technologies, bringing its total funding past $100M. (<a href="https://fortune.com/2026/06/16/voice-ai-bland-50-million-after-being-rejected-by-180-investors/">Fortune</a>)</p></li><li><p><strong>Soniox launches</strong> <strong>v5</strong> Real-Time and Async, a speech model that turns live conversations into structured, speaker-aware intelligence. (<a href="https://soniox.com/blog/soniox-v5-real-time">Soniox</a>)</p></li></ul><ul><li><p><strong>Google launches</strong> a $99 Gemini-powered Home Speaker, its first standalone smart speaker since the Nest Audio in 2020. (<a href="https://techcrunch.com/2026/06/17/google-bets-on-gemini-to-reinvent-the-smart-home-speaker/">TechCrunch</a>)</p></li><li><p><strong>Plaud crosses</strong> $100M ARR in two years, making it the fastest hardware-led AI company to hit that milestone. (<a href="https://itbrief.com.au/story/plaud-says-arr-jumps-to-usd-100-million-in-two-years">ITBrief</a>)</p></li><li><p><strong>Respond.io raises</strong> $62.5M Series B to expand its AI-powered customer messaging platform into North America and Europe. (<a href="https://martechseries.com/sales-marketing/messaging/respond-io-raises-62-5m-series-b-to-scale-ai-powered-customer-conversations-into-north-america-and-europe/">MarTech Series</a>)</p></li><li><p><strong>Poland invests</strong> $11M in ElevenLabs and launches AI Lab Poland to grow its national AI ecosystem. (<a href="https://mezha.ua/en/news/polshcha-vkladaye-11-mln-u-rozvitok-shi-startapu-elevenlabs-312384/amp/">Mezha</a>)</p></li><li><p><strong>Mistral ships</strong> Voxtral Transcribe 2, an open-source on-device ASR model with batch transcription at $0.003 per minute. (<a href="https://mistral.ai/news/voxtral-transcribe-2/">Mistral</a>)</p></li><li><p><strong>Gnani AI launches</strong> Prisma v2.5, ranking first in 8 of 9 Indian language ASR benchmarks against Sarvam and ElevenLabs. (<a href="https://www.medianama.com/2026/06/223-gnani-ai-prisma-v2-5-speech-recognition-model-better-accuracy-sarvam/">MediaNama</a>)</p></li><li><p><strong>Tencent Cloud and Inworld AI</strong> partner to integrate sub-130ms TTS into Tencent&#8217;s real-time communication infrastructure. (<a href="https://en.prnasia.com/releases/apac/tencent-cloud-and-inworld-ai-announce-strategic-partnership-to-deliver-a-one-stop-lifelike-realtime-voice-ai-solution-537364.shtml">PR Newswire Asia</a>)</p></li><li><p><strong>Tencent Cloud and Soniox</strong> partner to bring multilingual speech-to-text across 200+ countries via Tencent RTC. (<a href="https://futurecio.tech/tencent-cloud-and-soniox-partner-to-elevate-enterprise-voice-ai/">FutureCIO</a>)</p></li><li><p><strong>DeepL acquires</strong> Mixhalo&#8217;s ultra-low-latency audio team and technology to scale its real-time voice translation product. (<a href="https://www.wallstreet-online.de/nachricht/21013238-eqs-news-deepl-expands-into-silicon-valley-adds-mixhalo-team-and-technology-to-accelerate-voice-ai-at-scale">PR Newswire</a>)</p></li><li><p><strong>TELUS Digital and Cresta</strong> partner to deliver AI agents alongside human agents in enterprise contact centers. (<a href="https://www.prnewswire.com/news-releases/telus-digital-and-cresta-partner-to-deliver-ai-agents-and-augment-human-agents-to-elevate-customer-experience-857014786.html">PR Newswire</a>)</p></li><li><p><strong>Parloa becomes</strong> the first agentic AI provider on Alvaria&#8217;s outbound platform, targeting regulated industries. (<a href="https://www.prnewswire.com/news-releases/parloa-and-alvaria-set-to-revolutionize-proactive-support-with-industry-first-in-agentic-cx-302802254.html">PR Newswire</a>)</p></li><li><p><strong>LiveKit Inference</strong> now defaults to zero data retention, meaning prompts and audio are never stored by any model provider. (<a href="https://x.com/livekit/status/2067319738926006387?s=20">LiveKit</a>)</p></li><li><p><strong>AI fraud cost</strong> $442B globally in 2025 as voice clones now fool even experts, per an INTERPOL report. (<a href="https://www.techtimes.com/articles/318458/20260616/ai-fraud-cost-world-442-billion-last-year-voice-clones-now-fool-even-experts.htm">TechTimes</a>)</p></li><li><p><strong>UC study finds</strong> vocal similarity alone drives persuasion, with listeners complying more when a speaker&#8217;s voice matches theirs. (<a href="https://www.uc.edu/news/articles/2026/06/ai-voice-cloning-vocal-similarity-uc-study.html">UC News</a>)</p></li><li><p><strong>AI voice clones</strong> are up to 20% more intelligible than real humans in noisy environments, a new JASA study shows. (<a href="https://www.psypost.org/ai-voice-clones-are-easier-to-understand-in-noisy-environments-than-real-humans/">PsyPost</a>)</p></li><li><p><strong>India&#8217;s telecom layer</strong> needs rebuilding for voice AI to scale, with traditional infrastructure adding 300-500ms of latency. (<a href="https://inc42.com/buzz/why-rebuilding-telecom-infrastructure-is-critical-for-next-wave-of-voice-ai/">Inc42</a>)</p></li><li><p><strong>Multilingual voice AI</strong> is India&#8217;s next big opportunity, with 600M+ vernacular users driving enterprise demand. (<a href="https://www.expresscomputer.in/guest-blogs/why-multilingual-voice-ai-is-indias-next-big-opportunity/136033/">Express Computer</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>TowardsAI tutorial</strong> on using Gemini streaming TTS to make voice apps feel instant. (<a href="https://pub.towardsai.net/gemini-streaming-tts-how-developers-can-make-ai-voice-apps-feel-instant-01ef246f398e">Towards AI</a>)</p></li></ul><ul><li><p><strong>Dev.to walkthrough</strong> of building a voice AI platform with 28 modules in Python. (<a href="https://dev.to/ryanwinston_134/building-a-voice-ai-platform-with-28-modules-in-python-4hbm">Dev.to</a>)</p></li><li><p><strong>CTO field report</strong> on testing 184 AI text-to-speech models across quality, latency, and cost. (<a href="https://dev.to/gentleforge/i-tested-184-ai-text-to-speech-models-a-ctos-field-report-20h6">Dev.to</a>)</p></li><li><p><strong>Dev.to tutorial</strong> on simple text-to-speech in Python using PythonAIBrain. (<a href="https://dev.to/divyanshusinha136/text-to-speech-in-python-made-simple-with-pythonaibrain-2bc8">Dev.to</a>)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Krisp Voice Translation v3, New Siri AI and more]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/krisp-voice-translation-v3-new-siri</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/krisp-voice-translation-v3-new-siri</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 15 Jun 2026 14:03:02 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/69211f54-fa73-41cd-bf6a-05bae5057589_1450x730.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>Krisp ships</strong> Voice Translation v3 with 96% accuracy in 61 languages and opens a self-serve developer API. (<a href="https://krisp.ai/blog/krisp-launches-v3-real-time-voice-translation/">Krisp</a> | <a href="https://krisp.ai/blog/introducing-voice-translation-api/">Krisp</a>)</p></li><li><p><strong>Apple launches</strong> Siri AI at WWDC with multi-turn conversations and a standalone app powered by Gemini. (<a href="https://www.apple.com/newsroom/2026/06/apple-introduces-siri-ai-a-profoundly-more-capable-and-personal-assistant/">Apple</a>)</p></li></ul><ul><li><p><strong>Google launches</strong> Gemini 3.5 Live Translate for real-time speech translation across 70+ languages. (<a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-live-3-5-translate/">Google</a>)</p></li><li><p><strong>Mistral is raising</strong> ~&#8364;3B at a &#8364;20B valuation, nearly doubling since its September round. (<a href="https://techcrunch.com/2026/06/12/mistral-is-rumored-to-be-raising-e3b-at-e20-valuation/">TechCrunch</a>)</p></li><li><p><strong>Equal AI raises</strong> $30M Series B to scale India&#8217;s voice-first AI assistant across a billion smartphones. (<a href="https://www.livemint.com/companies/start-ups/equal-ai-funding-series-b-prosus-ventures-tomales-bay-capital-consumer-ai-voice-ai-11781241081404.html">LiveMint</a>)</p></li><li><p><strong>NICE makes</strong> agentic AI the native architecture of its CX platform at NICE World 2026. (<a href="https://www.cmswire.com/contact-center/nice-makes-its-move-at-nice-world-2026-agentic-ai-is-now-the-architecture/">CMSWire</a>)</p></li><li><p><strong>Microsoft launches</strong> MAI-Voice-2, a TTS model supporting 10 languages and zero-shot voice cloning. (<a href="https://www.blockchain-council.org/ai/introducing-mai-voice-2/">Blockchain Council</a>)</p></li><li><p><strong>AI voice scams</strong> surged 1,210% in 2025, needing just 3 seconds of audio to clone any voice. (<a href="https://www.foxnews.com/tech/ai-voice-scams-clone-familys-voice">Fox News</a>)</p></li><li><p><strong>Google will save</strong> search images and audio by default for AI model training. (<a href="https://www.theverge.com/tech/947836/google-search-privacy-settings-images-audio">The Verge</a>)</p></li><li><p><strong>AI ambient scribes</strong> cut physician burnout by 21 percentage points in a Mass General Brigham study. (<a href="https://www.medicaldaily.com/ai-ambient-scribe-physician-burnout-mass-general-brigham-ucla-study-2026-475610">Medical Daily</a>)</p></li><li><p><strong>MindBio delivers</strong> AI voice kiosks that detect intoxication and fatigue from speech patterns. (<a href="https://www.streetwisereports.com/article/2026/06/12/mindbio-therapeutics-advances-ai-voice-tech-for-workplace-safety-in-growing-biotech-and-ai-detector-markets.html">StreetWise Reports</a>)</p></li><li><p><strong>Top Gear asks</strong> whether AI voice control in cars is the next big thing or a waste of time. (<a href="https://www.topgear.com/car-news/electric/ai-voice-control-cars-next-big-thing-or-a-complete-waste-time">Top Gear</a>)</p></li><li><p><strong>WSJ reports</strong> the job AI was supposed to kill now needs more humans than ever. (<a href="https://www.wsj.com/tech/ai/the-job-that-ai-was-supposed-to-kill-needs-more-humans-than-ever-0771e4cf">WSJ</a>)</p></li><li><p><strong>Voicegain hires</strong> a VP of Sales to push voice AI into healthcare call centers. (<a href="https://www.prweb.com/releases/voicegain-appoints-tracy-puleo-as-vice-president-of-sales-to-accelerate-voice-ai-growth-in-healthcare-call-centers-302793704.html">PRWeb</a>)</p></li><li><p><strong>Speechmatics named</strong> HackerNoon&#8217;s Company of the Week for speech AI innovation. (<a href="https://hackernoon.com/meet-speechmatics-hackernoon-company-of-the-week">HackerNoon</a>)</p></li><li><p><strong>Voice AI adoption</strong> crosses an enterprise threshold in contact centers with measurable ROI. (<a href="https://www.cxtoday.com/contact-center/why-voice-ai-adoption-is-accelerating-in-2026/">CXToday</a>)</p></li><li><p><strong>India positions</strong> itself as the world&#8217;s CX leader as voice AI reshapes its call center industry. (<a href="https://www.expresscomputer.in/news/indias-voice-ai-opportunity-from-the-worlds-call-centre-to-the-worlds-cx-leader/135823/">Express Computer</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>Kyutai shows</strong> how RL post-training improves turn-taking and backchanneling in full-duplex voice models. (<a href="https://kyutai.org/blog/2026-06-10-interactivity">Kyutai</a>)</p></li></ul><ul><li><p><strong>Treble and Hugging Face</strong> launch FFASR, the first open benchmark for far-field speech recognition. (<a href="https://www.newsfilecorp.com/release/300719/Treble-Technologies-and-Hugging-Face-Address-Voice-AIs-Unspoken-Dilemma-With-Groundbreaking-Benchmark-of-ASR-Models">Newsfilecorp</a>)</p></li><li><p><strong>Red Hat publishes</strong> a guide to building a local voice agent with OpenShift AI. (<a href="https://developers.redhat.com/articles/2026/06/08/build-local-voice-agent-red-hat-openshift-ai">Red Hat Developer</a>)</p></li><li><p><strong>DrivenData announces</strong> winners of &#8220;On Top of Pasketti,&#8221; a children&#8217;s speech recognition challenge. (<a href="https://drivendata.co/blog/on-top-of-pasketti-winners">DrivenData</a>)</p></li><li><p><strong>Dev.to tutorial</strong> on extracting conversation intelligence from audio beyond simple dictation. (<a href="https://dev.to/nfc/beyond-dictation-how-to-extract-true-conversation-intelligence-from-audio-in-seconds-21ep">Dev.to</a>)</p></li><li><p><strong>Dev.to tutorial</strong> on building voice agents that send follow-up emails via Nylas. (<a href="https://dev.to/qasim157/voice-agents-that-follow-up-by-email-5ej6">Dev.to</a>)</p></li><li><p><strong>Blog tutorial</strong> covers building an ElevenLabs + n8n voice AI sales agent end to end. (<a href="https://whoisalfaz.me/blog/elevenlabs-n8n-voice-ai-sales-agent/">whoisalfaz.me</a>)</p></li><li><p><strong>ParseJargon paper</strong> introduces real-time jargon translation for online meetings using LLMs. (<a href="https://arxiv.org/abs/2508.10239">arXiv</a>)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Grok Powers Vapi, Gemma 4 Brings Audio to Your Laptop]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/grok-powers-vapi-gemma-4-brings-audio</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/grok-powers-vapi-gemma-4-brings-audio</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 08 Jun 2026 14:02:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/569c37ee-e821-43ec-bcb5-a157d010e893_1896x1054.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>xAI brings Grok</strong> TTS and STT to Vapi, letting developers build voice agents with Grok&#8217;s speech models. (<a href="https://x.ai/news/grok-vapi">xAI</a>)</p></li><li><p><strong>Sesame launches</strong> its iOS app with four conversational voice agents, built by the co-founders of Oculus. (<a href="https://techcrunch.com/2026/05/28/sesame-the-conversational-ai-startup-from-oculus-founders-launches-its-ios-app/">TechCrunch</a>)</p></li></ul><ul><li><p><strong>Google releases</strong> Gemma 4 12B, an open-source multimodal model with native audio that runs on a 16GB laptop. (<a href="https://venturebeat.com/technology/googles-new-open-source-gemma-4-12b-analyzes-audio-video-and-runs-entirely-locally-on-a-typical-16gb-enterprise-laptop">VentureBeat</a>)</p></li><li><p><strong>AethexAI raises $3M</strong> to build voice AI infrastructure for Africa and the Middle East. (<a href="https://techcrunch.com/2026/06/03/these-two-founders-left-goldman-and-meta-to-build-voice-ai-for-markets-everyone-else-overlooked/">TechCrunch</a>)</p></li><li><p><strong>Aircall acquires</strong> Piper AI to add revenue intelligence and sales automation to its voice platform. (<a href="https://www.webpronews.com/aircall-buys-piper-ai-in-bid-to-own-the-full-sales-revenue-cycle/">WebProNews</a>)</p></li><li><p><strong>8x8 launches Pulse</strong>, a conversational intelligence tool that turns calls and chats into actionable business insights. (<a href="https://www.businesswire.com/news/home/20260603379786/en/8x8-Introduces-8x8-Pulse-Conversational-Intelligence-Built-for-Where-Decisions-Are-Made">BusinessWire</a>)</p></li><li><p><strong>Google rolls out</strong> real-time deepfake voice detection on Android to catch AI scam calls as they happen. (<a href="https://www.techbuzz.ai/articles/google-deploys-ai-to-detect-deepfake-voice-scams-in-real-time">TechBuzz</a>)</p></li><li><p><strong>Microsoft Edge adds</strong> on-device speech recognition and translation APIs powered by local AI models. (<a href="https://blogs.windows.com/msedgedev/2026/06/02/expanding-on-device-ai-in-microsoft-edge-new-models-and-apis-for-the-web/">Microsoft</a>)</p></li><li><p><strong>McDonald&#8217;s pilots</strong> ArchIQ, a voice AI drive-thru that handles 90% of orders without human help. (<a href="https://www.theedadvocate.org/how-mcdonalds-ai-drive-thru-system-could-change-fast-food-forever/">TheEdAdvocate</a>)</p></li><li><p><strong>Peak XV eyes</strong> a $10M round in Ringg AI as Indian voice agent startups gain momentum. (<a href="https://m.economictimes.com/tech/funding/peak-xv-in-talks-to-back-ringg-ai-sources-say-as-voice-ai-gains-attention/articleshow/131488657.cms">Economic Times</a>)</p></li><li><p><strong>Deepgram partners</strong> with Fortanix to run voice AI on-premises using NVIDIA confidential computing. (<a href="https://itnerd.blog/2026/06/01/deepgram-delivers-private-voice-ai-to-regulated-industries-with-on-premises-deployments-powered-by-fortanix-confidential-ai-and-nvidia-confidential-computing/">ITNerd</a>)</p></li><li><p><strong>Americans lost $893M</strong> to AI scams last year, with voice cloning attacks leading the surge. (<a href="https://www.the-independent.com/news/world/americas/crime/ai-scams-americans-lost-millions-b2984788.html">The Independent</a>)</p></li><li><p><strong>Equity demands</strong> Fish Audio remove unauthorized AI clones of performers&#8217; voices from its platform. (<a href="https://www.equity.org.uk/news/2026/equity-demands-fish-audio-removes-unauthorised-ai-voices">Equity</a>)</p></li><li><p><strong>Sarvam AI opens</strong> its multilingual voice agents platform to the public, covering 11 Indian languages. (<a href="https://letsdatascience.com/news/sarvam-ai-opens-voice-agents-platform-to-public-a32ad441">LetDataScience</a>)</p></li><li><p><strong>Ubuntu plans</strong> to ship AI-powered speech-to-text across all text fields in the OS. (<a href="https://www.omgubuntu.co.uk/2026/06/ubuntu-speech-to-text-ai/amp">OMG Ubuntu</a>)</p></li><li><p><strong>ENCO debuts</strong> EnSpeak, a real-time voice-to-voice translation system for live venues and classrooms. (<a href="https://ravepubs.com/enco-brings-real-time-voice-translation-to-proav-with-infocomm-debut-of-enspeak/">RavePubs</a>)</p></li><li><p><strong>Broadvoice launches</strong> GoEngage and AI Analyst, adding speech-to-speech voice AI to its contact center. (<a href="http://www.smartcustomerservice.com/Articles/News-Briefs/Broadvoice-Launches-GoEngage-and-AI-Analyst-175105.aspx">SmartCustomerService</a>)</p></li><li><p><strong>ElevenLabs opens</strong> a pop-up store in NYC where every part of the experience is run by a voice agent. (<a href="https://letsdatascience.com/news/elevenlabs-runs-nyc-pop-up-featuring-voice-agents-a46404fb">LetDataScience</a>)</p></li><li><p><strong>RingCentral leads</strong> the G2 Summer 2026 AI VoIP category with 137 product badges. (<a href="https://www.ringcentral.com/us/en/blog/ringcental-leads-g2-summer-2026-ai-voip-category/">RingCentral</a>)</p></li><li><p><strong>In2ition AI launches</strong> Iris, an always-on AI companion that joins live meetings instead of just transcribing them. (<a href="https://www.prnewswire.com/news-releases/in2ition-ai-launches-iris-the-always-on-ai-companion-that-participates-in-conversations-instead-of-analyzing-them-after-they-end-302789131.html">PRNewswire</a>)</p></li><li><p><strong>Astreya integrates</strong> 3CLogic voice AI into its ServiceNow-based IT service desk. (<a href="https://www.prnewswire.com/news-releases/astreya-expands-ai-first-service-desk-with-3clogic-integration-unifying-voice-ai-and-itsm-on-servicenow-302788520.html">PRNewswire</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>NVIDIA publishes</strong> a fine-tuning guide for Nemotron 3.5 ASR, its 600M-param streaming model covering 40 languages. (<a href="https://huggingface.co/blog/nvidia/fine-tuning-nemotron-35-asr">Hugging Face</a>)</p></li></ul><ul><li><p><strong>Higgs Audio v3</strong> is a 4B-param chat-native TTS model supporting 102 languages with zero-shot voice cloning. (<a href="https://www.lmsys.org/blog/2026-06-04-higgs-audio-v3-tts/">LMSYS</a>)</p></li><li><p><strong>MisoTTS</strong> is an 8B emotive TTS model with open weights that claims 110ms latency. (<a href="https://www.marktechpost.com/2026/06/04/miso-labs-releases-misotts-an-8b-emotive-text-to-speech-model-with-open-weights/">MarkTechPost</a>)</p></li><li><p><strong>Audio-Interaction</strong> is a 3B open-source model that listens nonstop and decides every 0.4 seconds whether to speak. (<a href="https://the-decoder.com/new-open-source-voice-model-listens-nonstop-and-decides-every-0-4-seconds-whether-to-speak-or-stay-silent/">The Decoder</a>)</p></li><li><p><strong>pyannote.ai&#8217;s Bredin</strong> explains how speaker diarization makes voice AI understand conversations, not just transcribe them. (<a href="https://www.startuphub.ai/ai-news/ai-research/2026/pyannoteai-s-bredin-on-building-conversational-voice-ai">StartupHub</a>)</p></li><li><p><strong>HackerNoon walks through</strong> how to transfer an AI voice agent to a human without losing context. (<a href="https://hackernoon.com/the-warm-handoff-how-to-transfer-an-ai-voice-agent-to-a-human-without-losing-context">HackerNoon</a>)</p></li><li><p><strong>HackerNoon lists</strong> the 7 best voice agent testing platforms for 2026. (<a href="https://hackernoon.com/7-of-the-best-voice-agent-testing-platforms-in-2026">HackerNoon</a>)</p></li><li><p><strong>TechStartups breaks down</strong> how speech datasets for AI are built, what they contain, and where they fail. (<a href="https://techstartups.com/2026/06/01/speech-datasets-for-ai-what-they-contain-how-theyre-built-and-where-they-break/">TechStartups</a>)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Anthropic's Trillion-Dollar Moment]]></title><description><![CDATA[Voice AI weekly digest: massive funding, major platform partnerships, translation breakthroughs, and a growing push into wearables, healthcare, and enterprise software.]]></description><link>https://voice-ai-newsletter.krisp.ai/p/anthropics-trillion-dollar-moment</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/anthropics-trillion-dollar-moment</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 01 Jun 2026 13:45:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/aac0e4f8-a458-4a67-a9bd-5157801ebcb1_1836x1088.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p>Anthropic closes a Series H near a $965B valuation, landing alongside its Claude Opus 4.8 launch. (<a href="https://techcrunch.com/2026/05/28/anthropic-raises-65-billion-nears-1t-valuation-ahead-of-ipo/">TechCrunch</a>)</p></li><li><p>Parloa deploys its $350M war chest into partnerships with SAP, Microsoft, OpenAI, Five9, and Epic. (<a href="https://thenextweb.com/news/parloa-turns-its-350-million-war-chest-into-a-partnership-web-spanning-sap-microsoft-and-openai">The Next Web</a>)</p></li><li><p>Exclusive: Krisp scales its infra deployment paradigm (<a href="https://www.youtube.com/watch?v=09plsaCZAAU">AIM Network</a>)</p></li><li><p>Greenhouse acquires Ezra AI Labs, folding a voice-AI interviewer into its hiring platform. (<a href="https://www.prnewswire.com/news-releases/greenhouse-completes-acquisition-of-ezra-ai-labs-bringing-conversational-ai-to-the-hiring-process-302782372.html">PR Newswire</a>)</p></li><li><p>Alibaba Updates Speech Translation Model, Triples Language Coverage (<a href="https://slator.com/alibaba-speech-translation-model-triples-language-coverage/">Slator</a>)</p></li><li><p>StepFun ships StepAudio 2.5 Realtime, an end-to-end speech LLM with roleplay RLHF and paralinguistic perception. (<a href="https://www.marktechpost.com/2026/05/24/stepfun-releases-stepaudio-2-5-realtime-an-end-to-end-voice-model-with-roleplay-specific-rlhf-and-paralinguistic-comprehension/">MarkTechPost</a>)</p></li><li><p>COLDI launches a turnkey platform for integrated AI voice agents aimed at lead management. (<a href="https://www.prnewswire.com/news-releases/coldi-unveils-turnkey-platform-for-integrated-ai-voice-agents-302781678.html">PR Newswire</a>)</p></li><li><p>What the Language Solutions and AI Market Should Take Away From Google I/O (<a href="https://slator.com/language-solutions-ai-market-take-aways-google/">Slator</a>)</p></li><li><p>Palabra.ai crosses $1M ARR, a 17x six-month climb for its real-time speech-to-speech translator. (<a href="https://aithority.com/uncategorized/palabra-ai-real-time-ai-voice-translator-hits-1m-arr-grows-17x-in-six-months/">AiThority</a>)</p></li><li><p>iFlytek debuts 40g AI glasses with an on-device GlassClaw agent and live translation in 122 languages. (<a href="https://longbridge.com/en/news/287989104">Longbridge</a>)</p></li><li><p>iFLYTEK unveils AI Recorder S6 with long-range voice recording and smart summaries (<a href="https://markets.financialcontent.com/stocks/article/abnewswire-2026-5-29-iflytek-unveils-ai-recorder-s6-with-long-range-voice-recording-smart-summaries-and-enterprise-grade-data-security">FinancialContent</a>)</p></li><li><p>An ElevenLabs-linked deal licenses Stan Lee&#8217;s voice and likeness for AI-narrated audiobooks and comics. (<a href="https://kotaku.com/stan-lee-marvel-voice-likeness-rights-ai-elevenlabs-2000699882">Kotaku</a>)</p></li><li><p>What Apple&#8217;s New AI Glasses Mean for the Future of Wearables. (<a href="https://www.geeky-gadgets.com/apple-ai-glasses-features/">Geeky Gadgets</a>)</p></li><li><p>A new study shows inaudible audio commands can hijack AI voice models unheard by humans. (<a href="https://decrypt.co/369042/inaudible-audio-attacks-hijack-ai-voice-models">Decrypt</a>)</p></li><li><p>AI Studios Launches Context-Aware Expressive TTS with 1,000+ AI Voices (<a href="https://markets.businessinsider.com/news/stocks/ai-studios-launches-context-aware-expressive-tts-with-1-000-ai-voices-1036193321">Business Insider</a>)</p></li><li><p>What healthcare organizations need to get right about AI transcription. (<a href="https://natlawreview.com/article/ai-transcription-tools-health-care-what-house-counsel-needs-get-right">National Law Review</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p>OmniVoice Studio ships as a local, open-source ElevenLabs alternative with cloning, dubbing, diarization, and an MCP server. (<a href="https://www.marktechpost.com/2026/05/26/meet-omnivoice-studio-a-local-open-source-alternative-to-elevenlabs/">MarkTechPost</a>)</p></li><li><p>A field guide to production voice agents tackles sub-300ms latency with LiveKit and WebRTC. (<a href="https://dev.to/dishant_sethi/building-production-voice-ai-agents-latency-architecture-and-what-nobody-tells-you-3jhj">dev.to</a>)</p></li><li><p>A walkthrough adds Gemma 4 speech recognition to a .NET desktop app via a llama-server sidecar. (<a href="https://dev.to/mdemin729/adding-gemma-4-speech-recognition-to-a-net-desktop-app-the-llama-server-sidecar-that-survived-298j">dev.to</a>)</p></li><li><p>Vaani pairs speech recognition with Indian Sign Language on Android using MediaPipe. (<a href="https://dev.to/kinara2020/vaani-ai-making-communication-more-inclusive-with-speech-recognition-and-indian-sign-language-1i55">dev.to</a>)</p></li><li><p>FlowSpeech offers context-aware TTS with controllable emotion, pacing, and pauses across 30+ voices. (<a href="https://flowspeech.io/">flowspeech.io</a>)</p></li><li><p>Vowen runs fully offline STT on Windows and macOS, free and privacy-first. (<a href="https://www.majorgeeks.com/files/details/vowen.html">MajorGeeks</a>)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Google I/O Goes Voice-First, Corti Beats OpenAI on Medical STT]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/google-io-goes-voice-first-corti</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/google-io-goes-voice-first-corti</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 25 May 2026 14:03:05 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/515b6cef-3dc9-4faa-be44-a239892b0b3d_1290x966.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>Google adds voice</strong> to Gmail, Docs and Keep letting users search their inbox and dictate by speaking instead of typing. (<a href="https://techcrunch.com/2026/05/19/you-can-now-talk-to-your-gmail-inbox-as-seen-at-google-io-2026/">TechCrunch</a>)</p></li></ul><ul><li><p><strong>Google unveils</strong> audio-powered smart glasses at I/O 2026, taking on Meta in the wearable AI race. (<a href="https://techcrunch.com/2026/05/19/google-takes-a-page-out-of-metas-book-announces-new-audio-powered-smart-glasses-at-io-2026/">TechCrunch</a>)</p></li><li><p><strong>Spotify launches</strong> an ElevenLabs-powered tool that lets authors create audiobooks from text. (<a href="https://techcrunch.com/2026/05/21/spotify-launches-an-elevenlabs-powered-audiobook-creation-tool/">TechCrunch</a>)</p></li><li><p><strong>Corti&#8217;s Symphony model</strong> outperforms OpenAI&#8217;s Whisper on medical terminology accuracy for speech-to-text. (<a href="https://venturebeat.com/technology/cortis-new-symphony-for-speech-to-text-model-beats-openai-at-medical-terminology-accuracy-highlighting-the-value-of-specialized-ai">VentureBeat</a>)</p></li><li><p><strong>Zoom opens</strong> its AI Translator and Summarizer as standalone APIs for third-party developers. (<a href="https://news.zoom.com/zoom-mcp-expanded-capabilities/">Zoom</a> | <a href="https://slator.com/zoom-ai-services-translator-summarizer/">Slator</a>)</p></li><li><p><strong>Twilio shares surged</strong> 60% as voice AI adoption accelerates across its communications platform. (<a href="https://sebastianbarros.substack.com/p/twilio-shares-surged-60-on-voice">Sebastian Barros</a>)</p></li><li><p><strong>Zendesk expands</strong> its AI agents across ChatGPT, Gemini, voice and messaging channels. (<a href="https://www.techradar.com/pro/zendesk-expands-ai-agents-across-chatgpt-gemini-voice-and-messaging">TechRadar</a>)</p></li><li><p><strong>Kardome ships</strong> its voice AI in LG OLED TVs, reaching mass-market consumers for the first time. (<a href="https://audioxpress.com/news/kardome-voice-ai-reaches-mass-market-with-lg-oled-tv-deployments">AudioXpress</a>)</p></li><li><p><strong>Amazon&#8217;s Alexa</strong> can now generate full podcast episodes on any topic you ask for. (<a href="https://www.techbuzz.ai/articles/amazon-s-new-alexa-powered-feature-can-generate-podcast-episodes">TechBuzz</a>)</p></li><li><p><strong>Alibaba releases</strong> Qwen3.5 LiveTranslate Flash, a real-time interpreter covering 60 languages at 2.8-second latency. (<a href="https://www.marktechpost.com/2026/05/20/alibaba-qwen-team-introduces-qwen3-5-livetranslate-flash-real-time-multimodal-interpretation-across-60-languages-at-2-8-second-latency/">MarkTechPost</a>)</p></li><li><p><strong>NTSB shuts down</strong> its public docket after people used AI to recreate dead pilots&#8217; voices from spectrograms. (<a href="https://www.engadget.com/2180049/people-used-ai-to-recreate-the-voices-of-pilots-killed-in-a-plane-crash/">Engadget</a>)</p></li><li><p><strong>Columbia researchers</strong> pass the first human trial of a brain-controlled hearing system that isolates one speaker in noise. (<a href="https://www.medscape.com/viewarticle/brain-controlled-hearing-system-passes-first-human-trial-2026a1000gcc">Medscape</a>)</p></li><li><p><strong>iProov launches</strong> a deepfake detection system designed specifically for enterprise video calls. (<a href="https://financefeeds.com/iproov-launches-deepfake-detection-system-for-enterprise-video-calls/">FinanceFeeds</a>)</p></li><li><p><strong>Halsa Global launches</strong> Voice IQ, a Salesforce-native conversational AI for enterprise sales. (<a href="https://www.newswire.com/news/halsa-global-launches-voice-iq-a-salesforce-native-conversational-22784148">Newswire</a>)</p></li><li><p><strong>Korean tech firms</strong> double down on voice AI with localized models and in-car assistants. (<a href="https://www.koreatimes.co.kr/business/tech-science/20260522/tech-firms-double-down-on-voice-ai-as-next-battleground-emerges">Korea Times</a>)</p></li><li><p><strong>Tamber launches</strong> its AI music creation platform after raising $5M from Adobe Ventures. (<a href="https://www.musicbusinessworldwide.com/after-raising-5m-adobe-backed-tamber-officially-launches-its-ai-music-making-platform/">Music Business Worldwide</a>)</p></li><li><p><strong>TalkSign launches</strong> Palm 1.0 and Echo 1.0, AI models for sign language recognition and generation. (<a href="https://techcabal.com/2026/05/20/talksign-launches-ai-powered-palm-1-0-and-echo-1-0/">TechCabal</a>)</p></li><li><p><strong>CMU research shows</strong> adding audio cues like typing sounds makes AI feel more human but also more rude. (<a href="https://techxplore.com/news/2026-05-audio-cues-ai-human-users.html">TechXplore</a>)</p></li><li><p><strong>Office workers shift</strong> from typing to voice dictation as AI transcription apps go mainstream. (<a href="https://theweek.com/tech/the-changing-sounds-of-the-office">The Week</a>)</p></li><li><p><strong>Synthflow AI handles</strong> over 5 million calls a month as call centres move to voice AI at scale. (<a href="https://tech.eu/2026/05/22/the-call-centre-enters-the-voice-ai-era/">Tech.eu</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>AWS publishes</strong> a guide to building real-time voice apps with SageMaker AI and vLLM using bidirectional streaming. (<a href="https://aws.amazon.com/blogs/machine-learning/build-real-time-voice-applications-with-amazon-sagemaker-ai-and-vllm/">AWS Blog</a>)</p></li></ul><ul><li><p><strong>VoiceBox</strong> is an open-source voice cloning app that runs locally from 3 seconds of audio with no cloud uploads. (<a href="https://www.techtimes.com/articles/316850/20260519/voicebox-clones-any-voice-3-seconds-audio-runs-locally-free-has-no-consent-lock.htm">TechTimes</a>)</p></li><li><p><strong>Vowen</strong> is a free offline voice dictation tool for Windows and macOS that transcribes speech system-wide. (<a href="https://www.majorgeeks.com/files/details/vowen.html">MajorGeeks</a>)</p></li><li><p><strong>NoteSnip</strong> turns video transcripts into source-grounded AI study notes across YouTube, podcasts and PDFs. (<a href="https://dev.to/_993f2d61f0282f6943ea3/from-video-transcripts-to-source-grounded-ai-notes-a-practical-look-at-notesnip-33in">Dev.to</a>)</p></li><li><p><strong>IEEE Spectrum covers</strong> how Maori researchers are building indigenous AI voice models to preserve te reo Maori. (<a href="https://spectrum.ieee.org/indigenous-ai-voice-models-maori">IEEE Spectrum</a>)</p></li><li><p><strong>Memeburn ranks</strong> the best AI voice generators of 2026 by use case, from cloning to e-learning. (<a href="https://memeburn.com/best-ai-voice-generator/">Memeburn</a>)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Customer Service Hiring Is Surging. So Is Voice AI]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/customer-service-hiring-is-surging</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/customer-service-hiring-is-surging</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 18 May 2026 14:03:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bV3W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Customer service job postings are up ~8% YoY. More voice AI doesn&#8217;t mean fewer human agents - it means more conversations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bV3W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bV3W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 424w, https://substackcdn.com/image/fetch/$s_!bV3W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 848w, https://substackcdn.com/image/fetch/$s_!bV3W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!bV3W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bV3W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png" width="1248" height="1536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1536,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:950139,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://voice-ai-newsletter.krisp.ai/i/198210285?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bV3W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 424w, https://substackcdn.com/image/fetch/$s_!bV3W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 848w, https://substackcdn.com/image/fetch/$s_!bV3W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!bV3W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33f49e81-df6b-43c2-b40e-ba17d750ad1b_1248x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Top Updates &#128170;</h2><ul><li><p>Vapi raises $50M for its voice AI agent platform, now valued at $500M. (<a href="https://techcrunch.com/2026/05/12/vapi-hits-500m-valuation-as-amazon-ring-chose-its-ai-platform-over-40-rivals/">TechCrunch</a>)</p></li></ul><ul><li><p>Thinking Machines previews voice+video models that can listen and talk at the same time. (<a href="https://venturebeat.com/technology/thinking-machines-shows-off-preview-of-near-realtime-ai-voice-and-video-conversation-with-new-interaction-models">VentureBeat</a>)</p></li><li><p>Wispr seeks $260M at a $2B valuation for its voice dictation app. (<a href="https://www.bloomberg.com/news/articles/2026-05-12/ai-dictation-startup-wispr-in-funding-talks-at-2-billion-value">Bloomberg</a>)</p></li><li><p>OpenAI acquires Weights.gg, a voice cloning startup, and folds the team internally. (<a href="https://www.itvoice.in/openai-quietly-acquires-voice-cloning-startup-weights-gg-to-boost-audio-ai-capabilities">ITVoice</a>)</p></li><li><p>Medicare will reimburse AI voice agents that manage chronic care patients. (<a href="https://www.webpronews.com/medicares-quiet-bet-on-ai-agents-that-could-reshape-chronic-care/">WebProNews</a>)</p></li><li><p>Better.com&#8217;s voice agent handles 35% of mortgage calls without human involvement. (<a href="https://www.pymnts.com/artificial-intelligence-2/2026/better-coms-ai-agent-resolved-35-of-mortgage-calls-alone/">PYMNTS</a>)</p></li><li><p>Bajaj Finance replaces 1,500 calling agents with 10 AI voice bots. (<a href="https://techstory.in/10-ai-bots-replace-1500-employees-at-bajaj-finance-as-automation-wave-intensifies/">TechStory</a>)</p></li><li><p>Rivian rolls out a voice assistant across its R1 and R2 vehicles. (<a href="https://insideevs.com/news/795539/rivian-assistant-launch-r1-r2-2026/">InsideEVs</a>)</p></li><li><p>Quiq adds voice AI to its platform and rebrands for enterprise scale. (<a href="https://customerservicemanager.com/quiq-expands-voice-ai-and-rebrands-to-focus-on-scaled-enterprise-deployments/">CSM Magazine</a>)</p></li><li><p>ElevenLabs signs McConaughey, Caine, and Minnelli for AI voice partnerships. (<a href="https://deadline.com/2026/05/elevenlabs-mati-staniszewski-matthew-mcconaughey-ai-audio-1236900840/">Deadline</a>)</p></li><li><p>Activate invests in ElevenLabs to help grow its India business. (<a href="https://www.businesstoday.in/technology/story/activate-invests-in-elevenlabs-bets-big-on-indias-voice-ai-opportunity-531498-2026-05-14">BusinessToday</a>)</p></li><li><p>RingCentral named Leader by IDC, Omdia, and Metrigy for customer engagement. (<a href="https://www.ringcentral.com/us/en/blog/leader-analyst-reports-future-cus**tomer-engagement/">RingCentral Blog</a>)</p></li><li><p>Smallest AI runs its TTS on Tenstorrent chips at 4x lower cost. (<a href="https://m.thewire.in/article/ptiprnews/smallest-ai-and-tenstorrent-partnership-democratises-voice-ai-4x-reduction-in-cost-through-hardware-acceleration/amp">The Wire</a>)</p></li><li><p>MindBio detects intoxication from voice alone using AI speech analysis. (<a href="https://markets.businessinsider.com/news/stocks/networknews-audio-announces-audio-press-release-apr-discussing-combining-artificial-intelligence-with-speech-analysis-to-detect-intoxication-1036163396">BusinessInsider</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p>AWS adds Qwen3 speech models to SageMaker JumpStart for TTS and ASR. (<a href="https://aws.amazon.com/about-aws/whats-new/2026/05/speech-models-on-sagemaker-jumpstart/">AWS</a>)</p></li><li><p>Foundry Local v1.1 adds live speech-to-text that runs entirely on-device. (<a href="https://devblogs.microsoft.com/foundry/foundry-local-v1-1/">Microsoft DevBlogs</a>)</p></li></ul><ul><li><p>Supertone open-sources Supertonic v3, an on-device TTS supporting 31 languages. (<a href="https://www.marktechpost.com/2026/05/15/supertone-releases-supertonic-v3-on-device-text-to-speech-model-with-31-language-support-fewer-reading-failures-and-expression-tags/">MarkTechPost</a>)</p></li><li><p>Coval publishes open TTS benchmarks comparing speed and accuracy across major providers. (<a href="https://benchmarks.coval.ai/tts">Coval</a>)</p></li><li><p>OpenMOSS gets a C++ port for easy local deployment without Python. (<a href="https://startupfortune.com/openmoss-gets-a-c-port-as-local-voice-ai-chases-easier-deployment/">StartupFortune</a>)</p></li><li><p>ThirdReality ships a $70 open-source voice assistant for Home Assistant. (<a href="https://www.prweb.com/releases/thirdreality-launches-voice--music-assistant-dev-edition-302767779.html">PRWeb</a>)</p></li><li><p>Monologue adds CLI and MCP support for piping voice dictation into AI agents. (<a href="https://www.macstories.net/notes/monologue-notes-cli/">MacStories</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Updates from Krisp, OpenAI, ServiceNow and much more!]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/updates-from-krisp-openai-servicenow</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/updates-from-krisp-openai-servicenow</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 11 May 2026 14:00:36 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ae2e60e3-d0bb-4432-bdf7-ccf410b092a9_1370x774.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>Krisp launches VIVA 2.0</strong> with Turn Prediction v3 and a first-of-its-kind Interrupt Prediction model, all running on CPU with no transcription required. (<a href="https://krisp.ai/blog/viva-2-0-ai-infrastructure-for-voice-ai-agents/">Krisp Blog</a>)</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;56ef9704-29e8-4a52-b708-d3d1e89ea776&quot;,&quot;duration&quot;:null}"></div></li><li><p><strong>OpenAI launches three real-time audio models</strong> for its API: GPT-Realtime-2 with GPT-5-class reasoning, GPT-Realtime-Translate for live translation across 70+ languages, and GPT-Realtime-Whisper for streaming speech-to-text. (<a href="https://www.reuters.com/business/media-telecom/openai-unveils-three-audio-models-real-time-voice-tasks-2026-05-07/">Reuters</a>)</p></li></ul><ul><li><p><strong>Twilio unveils a Conversation Layer at SIGNAL 2026</strong> with persistent Memory, Orchestrator, Intelligence, and open-source Agent Connect for plugging in any AI provider. (<a href="https://martech.org/twilio-launches-conversation-layer-to-unify-ai-and-human-interactions/">MarTech</a>)</p></li><li><p><strong>Inworld ships Realtime TTS-2,</strong> a frontier voice model that reads user emotion and tone in real time and adapts pacing, softness, and empathy mid-conversation. (<a href="https://www.morningstar.com/news/business-wire/20260505096579/inworld-launches-new-frontier-voice-model-that-gives-ai-agents-contextual-empathy">BusinessWire</a>)</p></li><li><p><strong>ServiceNow unveils Otto,</strong> a unified conversational AI layer combining Now Assist, Moveworks, and voice agents across every department and system. (<a href="https://theaieconomy.substack.com/p/servicenow-otto-conversational-ai-enterprise">The AI Economy</a>)</p></li><li><p><strong>SoundHound launches OASYS,</strong> a self-learning agentic platform that auto-builds, orchestrates, and improves voice AI agents from documentation and transcripts. (<a href="https://www.globenewswire.com/news-release/2026/05/05/3287821/0/en/soundhound-ai-introduces-oasys-the-world-s-first-self-learning-orchestrated-agentic-ai-platform-where-ai-builds-ai.html">GlobeNewsWire</a>)</p></li><li><p><strong>ElevenLabs adds BlackRock, NVIDIA, and Jamie Foxx</strong> to its $550M+ Series D as annualized revenue crosses $500M, up from $350M at the end of 2025. (<a href="https://techcrunch.com/2026/05/05/elevenlabs-lists-blackrock-jamie-foxx-and-eva-longoria-as-new-investors/">TechCrunch</a>)</p></li><li><p><strong>Greenhouse acquires Ezra AI Labs</strong> to bring voice AI interviewing into its ATS as applications per recruiter have spiked over 400% since 2023. (<a href="https://www.prnewswire.com/news-releases/greenhouse-has-entered-into-a-definitive-agreement-to-acquire-ezra-ai-labs-bringing-conversational-ai-to-the-hiring-process-302762658.html">PR Newswire</a>)</p></li><li><p><strong>Ethos raises $22.75M from a16z</strong> for an expert network that onboards 35K people per week through voice AI interviews. (<a href="https://techcrunch.com/2026/05/06/ethos-raises-22-75m-from-a16z-for-its-expert-network-with-voice-onboarding/">TechCrunch</a>)</p></li><li><p><strong>8x8 launches AI Studio</strong> in early availability, letting teams describe needs in plain language and deploy voice and digital AI agents without adding vendors. (<a href="https://www.cmswire.com/contact-center/8x8-expands-cx-platform-with-ai/">CMSWire</a>)</p></li><li><p><strong>Wispr Flow bets on India</strong> as its fastest-growing market with Hinglish dictation support, 2.5M downloads, and 100% month-over-month growth. (<a href="https://techcrunch.com/2026/05/09/voice-ai-in-india-is-hard-wispr-flow-is-betting-on-it-anyway/">TechCrunch</a>)</p></li><li><p><strong>ElevenLabs powers SpoonLabs&#8217; audio novels,</strong> cutting production time from months to hours and launching PodNovel across Korea, Japan, and Taiwan. (<a href="https://www.digitaltoday.co.kr/en/view/52978/elevenlabs-supplies-voice-ai-solution-to-spoonlabs-audio-platform">DigitalToday</a>)</p></li><li><p><strong>eGain launches AI Agent IVA,</strong> a knowledge-powered virtual agent that replaces IVR dial trees with natural conversation and 24/7 voice support. (<a href="https://www.globenewswire.com/news-release/2026/05/06/3288531/0/en/egain-launches-ai-agent-iva-to-deliver-accurate-conversational-customer-service.html">GlobeNewsWire</a>)</p></li><li><p><strong>Gnani.ai hires eight senior execs</strong> after its $10M Series B, processing over 30M voice AI calls daily for 200+ enterprise customers in India. (<a href="https://www.businesstoday.in/technology/story/gnaniai-hires-senior-executives-across-bfsi-product-and-ai-delivery-after-10-million-fundingg-530008-2026-05-06">BusinessToday</a>)</p></li><li><p><strong>Vobiz.ai raises $1M seed</strong> to build AI-native telephony infrastructure in India with DID provisioning, low-latency SIP trunking, and LLM audio streaming. (<a href="https://www.techinasia.com/news/indian-startup-vobiz-ai-secures-1m-for-voice-ai">Tech in Asia</a>)</p></li><li><p><strong>Twinnin targets $3M seed round</strong> for its voice and face cloning marketplace where actors license digital likenesses to studios, backed by Google and NVIDIA. (<a href="https://deadline.com/2026/05/ai-plaform-twinnin-funding-round-3-million-signs-up-twins-1236882734/">Deadline</a>)</p></li><li><p><strong>BCM One partners with TD Synnex</strong> to bring Pure IP voice services and SkySwitch UCaaS to the MSP channel through the distributor&#8217;s partner network. (<a href="https://www.crn.com/news/channel-news/2026/bcm-one-td-synnex-partnership-helps-msps-cash-in-on-voice-ai-opportunity">CRN</a>)</p></li><li><p><strong>AI note-taking earbuds go mainstream</strong> as Viaim and Mobvoi ship wireless earbuds that record, transcribe, and summarize meetings entirely on-device. (<a href="https://www.howtogeek.com/ai-note-taking-earbuds-record-and-summarize-meetings/">How-To Geek</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>OpenAI publishes its WebRTC infrastructure playbook,</strong> detailing a split relay + transceiver architecture that routes voice AI sessions for 900M+ weekly users at 300-500ms latency. (<a href="https://openai.com/index/delivering-low-latency-voice-ai-at-scale/">OpenAI Blog</a>)</p></li></ul><ul><li><p><strong>TypeWhisper open-sources Mac dictation</strong> with 10 ASR engines including WhisperKit, Parakeet, Apple SpeechAnalyzer, Groq, and xAI Grok STT, all running locally. (<a href="https://github.com/TypeWhisper/typewhisper-mac">GitHub</a>)</p></li><li><p><strong>Dictee ships offline voice dictation for Linux</strong> as a KDE Plasma 6 plasmoid with Rust backend, 4 ASR engines, and NVIDIA Parakeet via ONNX Runtime. (<a href="https://github.com/rcspam/dictee">GitHub</a>)</p></li><li><p><strong>TTS models for Indian languages:</strong> a dev survey covering Hindi, Tamil, Bengali, and Telugu with architecture comparisons and demo links. (<a href="https://dev.to/vinodsrajpurohit/tts-models-for-indian-languages-the-tech-giving-bharat-a-voice-1ij7">dev.to</a>)</p></li><li><p><strong>Build a voice agent with LiveKit + AssemblyAI</strong> using Universal-3 Pro Streaming STT with function calling and MCP integration. (<a href="https://dev.to/martschweiger/build-a-voice-agent-with-livekit-and-assemblyais-voice-agent-api-3mnm">dev.to</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Voice Agents Go Mainstream]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/voice-agents-go-mainstream</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/voice-agents-go-mainstream</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 04 May 2026 13:17:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d31b576a-8c1c-46fe-be99-d396c55480dc_1896x1052.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Three important Voice AI events this week:</p><ul><li><p><a href="https://signal.twilio.com/">Twilio Signal</a> - May 6-7 in SF</p></li><li><p><a href="https://cerebralvalleyvoice.com/">Cerebral Valley Voice Summit</a> - May 6 in SF</p></li><li><p><a href="https://luma.com/SpeechAImeetup">NVIDIA Developer Meetup</a> | Building and Evaluating Real-time Voice Agents - May 7 in SF</p></li></ul><h2>Top Updates &#128170;</h2><ul><li><p><strong>xAI launches Custom Voices,</strong> a voice cloning API that creates a voice ID from 120 seconds of audio with speaker verification, plus 80+ built-in voices across 28 languages. (<a href="https://venturebeat.com/technology/xai-launches-grok-4-3-at-an-aggressively-low-price-and-a-new-fast-powerful-voice-cloning-suite">VentureBeat</a>)</p></li><li><p><strong>Microsoft ships real-time voice agents in Copilot Studio,</strong> now GA in Dynamics 365 Contact Center with low-latency speech-to-speech, interruptions, and mid-call language switching. (<a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/extend-ai-voice-support-introducing-real-time-voice-agents-in-microsoft-copilot-studio/">Microsoft Blog</a>)</p></li><li><p><strong>Amazon adds &#8220;Join the Chat&#8221; to product pages,</strong> letting shoppers ask voice or text questions during AI audio summaries and get real-time conversational answers. (<a href="https://techcrunch.com/2026/04/28/amazon-launches-an-ai-powered-audio-qa-experience-on-product-pages/">TechCrunch</a>)</p></li><li><p><strong>Otter.ai pivots from notetaker to Conversational Knowledge Engine,</strong> launching MCP connectors, AI Chat, and desktop app to turn meeting data into agentic workflows. (<a href="https://www.businesswire.com/news/home/20260428313206/en/Otter.ai-Evolves-from-AI-Notetaker-to-Create-%24100B-Enterprise-Conversational-Knowledge-Engine-Market">BusinessWire</a>)</p></li><li><p><strong>Deepgram launches Flux Multilingual</strong> with 10 languages and mid-call language switching, plus model-based turn detection under 400ms. (<a href="https://siliconangle.com/2026/04/29/deepgram-expands-flux-10-languages-mid-call-switching-voice-agents/">SiliconANGLE</a>)</p></li><li><p><strong>Twilio Q1 voice revenue hits a 19-quarter high,</strong> up 20% YoY with Conversational Intelligence and Branded Calling both growing over 100%. (<a href="https://thenextweb.com/news/twilio-q1-2026-voice-ai-revenue">The Next Web</a>)</p></li><li><p><strong>NordVPN adds AI voice deepfake detector</strong> to its Chrome extension, analyzing acoustic patterns in real time without recording or interpreting content. (<a href="https://betanews.com/article/nordvpn-adds-ai-voice-detector-to-its-chrome-extension/">BetaNews</a>)</p></li><li><p><strong>Audion raises $15M</strong> to bring AI-powered contextual audio ad targeting to the U.S., processing 500K hours of audio weekly for brands like Apple and Nike. (<a href="https://www.axios.com/2026/04/27/audion-audio-adtech-raises-us">Axios</a>)</p></li><li><p><strong>3CLogic launches outbound voice AI agents</strong> with multimodal voice+digital capabilities and an automated LLM-powered QA engine for scoring every AI interaction. (<a href="https://www.prnewswire.com/news-releases/3clogic-accelerates-enterprise-roi-with-new-outbound-voice-ai-agents-multimodal-voice-ai-capabilities-and-automated-ai-agent-evaluations-302753788.html">PR Newswire</a>)</p></li><li><p><strong>AI-generated podcasts are booming</strong> on Spotify, Apple, and YouTube, with AI hosts that sound convincingly human raising questions about disclosure. (<a href="https://www.inc.com/moses-jeanfrancois/ai-generated-podcasts-boom-on-audio-platforms-are-you-listening-to-one/91338876">Inc</a>)</p></li><li><p><strong>Tells launches AI voice agents on existing SMS numbers</strong> with a single toggle, adding sub-second-latency voice to any business texting line without a new number or integration. (<a href="https://aithority.com/machine-learning/tells-launches-ai-voice-agents-on-existing-sms-numbers-with-one-click/">AIthority</a>)</p></li><li><p><strong>SpeakON ships a MagSafe AI dictation accessory</strong> that turns iPhone voice input into formatted, tone-adapted text with translation across 12 languages. (<a href="https://9to5mac.com/2026/04/27/key-takeaways-after-testing-out-speakon-an-ai-powered-dictation-iphone-accessory/">9to5Mac</a>)</p></li><li><p><strong>Docplanner&#8217;s voice AI agent &#8220;Noa Booking&#8221;</strong> doubles doctor appointment bookings vs traditional call centers, built on Twilio ConversationRelay. (<a href="https://www.healthtechdigital.com/docplanner-expands-patient-access-with-voice-ai-agent-powered-by-twilio/">Health Tech Digital</a>)</p></li><li><p><strong>Lumeris adds native audio to its Tom platform</strong> using Gemini&#8217;s speech-to-speech capabilities for real-time, empathetic patient conversations in primary care. (<a href="https://hitconsultant.net/2026/04/22/lumeris-native-audio-tom-google-gemini-primary-care/">HIT Consultant</a>)</p></li><li><p><strong>Ablio launches AI-powered interpretation</strong> with hybrid human+AI model, combining ASR, neural translation, and TTS for live multilingual events on Zoom and Teams. (<a href="https://aithority.com/machine-learning/ablio-launches-ai-powered-interpretation-platform-with-hybrid-human-ai-model/">AIthority</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>Sakana AI introduces KAME,</strong> a tandem speech-to-speech architecture that lets a backend LLM inject knowledge in real time while the front-end keeps talking with near-zero latency. (<a href="https://pub.sakana.ai/kame/">Sakana AI</a>)</p></li></ul><ul><li><p><strong>NVIDIA releases Nemotron 3 Nano Omni,</strong> an open 30B-A3B multimodal model unifying vision, audio, and language with 9x higher throughput than competing omni models. (<a href="https://blogs.nvidia.com/blog/nemotron-3-nano-omni-multimodal-ai-agents/">NVIDIA Blog</a>)</p></li><li><p><strong>OpenMOSS releases MOSS-Audio,</strong> an open-source foundation model for speech, sound, music understanding, and time-aware audio reasoning in 4B and 8B variants. (<a href="https://www.marktechpost.com/2026/04/27/openmoss-releases-moss-audio-an-open-source-foundation-model-for-speech-sound-music-and-time-aware-audio-reasoning/">MarkTechPost</a>)</p></li><li><p><strong>Async publishes open TTS benchmark</strong> revealing major accuracy gaps when streaming models handle phone numbers, dates, and prices in production. (<a href="https://podnews.net/press-release/async-ai-voice-benchmark">Podnews</a>)</p></li><li><p><strong>Speaker diarization explained:</strong> how AI knows who said what, from spectral embeddings to clustering. (<a href="https://dev.to/quillhub/speaker-diarization-explained-how-ai-knows-who-said-what-9fi">dev.to</a>)</p></li><li><p><strong>Laravel AI SDK tutorial:</strong> add TTS and voice to your app in 20 minutes. (<a href="https://dev.to/hafiz619/laravel-ai-sdk-add-text-to-speech-and-voice-to-your-app-in-20-minutes-35fb">dev.to</a>)</p></li><li><p><strong>Hobbyist builds a C-3PO head</strong> with real-time voice interaction using off-the-shelf speech models. (<a href="https://letsdatascience.com/news/hobbyist-builds-c-3po-head-with-real-time-voice-63356d06">Let&#8217;s Data Science</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Voice AI's Consolidation Begins]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/voice-ais-consolidation-begins</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/voice-ais-consolidation-begins</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 27 Apr 2026 14:01:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/788195ac-4ca6-4e56-a622-1d5168942cab_1662x1230.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Two important Voice AI events in the coming weeks:</p><ul><li><p><a href="https://signal.twilio.com/">Twilio Signal</a> - May 6-7 in SF</p></li><li><p><a href="https://cerebralvalleyvoice.com/">Cerebral Valley Voice Summit</a> - May 6 in SF</p></li></ul><h2>Top Updates &#128170;</h2><ul><li><p><strong>xAI launches Grok Voice Think Fast 1.0,</strong> ranking #1 on the tau-voice Bench for full-duplex voice agents and already powering Starlink support with a 20% sales conversion rate. (<a href="https://x.ai/news/grok-voice-think-fast-1">xAI</a>)</p></li><li><p><strong>Anker unveils THUS,</strong> the first compute-in-memory AI audio chip, claiming 150x more on-device AI power for noise cancellation in its upcoming Soundcore earbuds. (<a href="https://www.theverge.com/tech/916463/anker-thus-chip-announcement">The Verge</a>)</p></li><li><p><strong>SoundHound acquires LivePerson for $43M,</strong> combining voice agentic AI with LivePerson&#8217;s digital messaging platform that handles one billion customer messages per month. (<a href="https://www.globenewswire.com/news-release/2026/04/21/3278086/0/en/soundhound-ai-to-acquire-liveperson-combining-proprietary-voice-agentic-ai-and-digital-messaging-to-create-a-world-leading-end-to-end-omnichannel-conversational-ai-platform.html">GlobeNewswire</a>)</p></li><li><p><strong>Krisp Voice AI SDK won double Webby Awards</strong> for Technical Achievement (<a href="https://www.linkedin.com/feed/update/urn:li:activity:7452374372713046016/">LinkedIn</a>)</p></li><li><p><strong>Speechmatics delivers on-device STT for Adobe Premiere,</strong> transcribing an hour of video in 55 seconds offline with accuracy within 5% of cloud. (<a href="https://www.tvtechnology.com/production/adobe-and-speechmatics-deliver-cloud-grade-on-device-speech-recognition-for-premiere">TV Technology</a>)</p></li><li><p><strong>Nothing launches Essential Voice,</strong> an AI dictation tool that cleans filler words and formats speech-to-text system-wide in 100+ languages. (<a href="https://techcrunch.com/2026/04/24/nothing-introduces-an-ai-powered-dictation-tool/">TechCrunch</a>)</p></li><li><p><strong>Synthflow AI and 8x8 partner</strong> to embed no-code voice AI agents directly into the 8x8 Contact Center platform across 30+ languages. (<a href="https://venturebeat.com/business/synthflow-ai-and-8x8-enter-strategic-partnership-to-deliver-next-generation-agentic-ai">VentureBeat</a>)</p></li><li><p><strong>Google Meet AI note-taking now works for in-person meetings,</strong> generating transcripts, summaries, and action items from face-to-face conversations via mobile. (<a href="https://lifehacker.com/tech/google-meet-can-now-take-notes-during-in-person-meetings">Lifehacker</a>)</p></li><li><p><strong>Xiaomi releases MiMo v2.5 TTS and open-sources MiMo v2.5 ASR,</strong> a full voice pipeline with voice cloning, voice design, and dialect-aware recognition for the agent era. (<a href="https://www.gizmochina.com/2026/04/24/xiaomi-introduces-mimo-v2-5-tts-and-asr-as-a-full-voice-pipeline-for-the-agent-era/">Gizmochina</a>)</p></li><li><p><strong>Volkswagen will ship voice AI in all China-built cars</strong> starting H2 2026, using on-device LLMs from Tencent, Alibaba, and Baidu. (<a href="https://www.cnbc.com/2026/04/21/volkswagen-voice-ai-chinese-cars-automaker.html">CNBC</a>)</p></li><li><p><strong>Newo appoints new CEO after $25M Series A</strong> to scale partner-led voice AI infrastructure for MSPs, VoIP providers, and software platforms serving SMBs. (<a href="https://www.globenewswire.com/news-release/2026/04/21/3277867/0/en/jason-luo-appointed-ceo-of-newo-to-accelerate-partner-led-growth-in-voice-ai-infrastructure-following-25m-series-a.html">GlobeNewswire</a>)</p></li><li><p><strong>Ericsson embeds AI calling and fraud detection into IMS,</strong> partnering with Hiya for real-time spam blocking as 86% of unknown calls go unanswered. (<a href="https://www.ericsson.com/en/blog/2026/4/ai-voice-in-telecom-powering-calls-and-securing-networks">Ericsson Blog</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>Streaming TTS models fail over 60% of sentences</strong> containing phone numbers, dates, and prices due to 5-20x less context than batch mode. (<a href="https://www.technology.org/2026/04/24/streaming-tts-models-fail-over-60-of-sentences-containing-numbers-dates-and-prices/">Technology.org</a>)</p></li></ul><ul><li><p><strong>AI neck sensor turns silent speech into voice</strong> by reading microscopic throat muscle movements with a CNN+transformer pipeline from POSTECH. (<a href="https://www.digitaltrends.com/wearables/ai-powered-neck-sensor-can-turn-silent-speech-into-audible-voice/">Digital Trends</a>)</p></li><li><p><strong>AWS guide to cost-effective multilingual transcription</strong> at scale using NVIDIA Parakeet TDT and AWS Batch. (<a href="https://aws.amazon.com/blogs/machine-learning/cost-effective-multilingual-audio-transcription-at-scale-with-parakeet-tdt-and-aws-batch/">AWS Blog</a>)</p></li><li><p><strong>Ghost Pepper:</strong> open-source browser extension for real-time voice transcription and LLM-powered responses. (<a href="https://matthartman.github.io/ghost-pepper/">GitHub</a>)</p></li><li><p><strong>Mimi Codec deep-dive</strong> on its layered audio compression design for neural speech coding. (<a href="https://letsdatascience.com/news/mimi-codec-reveals-layered-audio-compression-design-4ea7aa4a">LetsDDataScience</a>)</p></li><li><p><strong>AssemblyAI showcases configurable STT</strong> with tunable turn-taking, medical mode for streaming, and real-time speaker labeling. (<a href="https://www.tipranks.com/news/private-companies/assemblyai-showcases-configurable-speech-to-text-features-for-voice-ai-developers">TipRanks</a>)</p></li></ul><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Everyone Wants a Voice Platform]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/everyone-wants-a-voice-platform</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/everyone-wants-a-voice-platform</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 20 Apr 2026 14:04:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/673d0805-1f8f-4fae-a65a-cce1decc4a2d_1520x754.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>xAI ships standalone Grok STT and TTS APIs</strong> with streaming transcription at $0.20/hr and expressive TTS with inline emotion tags across 20 languages. (<a href="https://x.ai/news/grok-stt-and-tts-apis">xAI</a>)</p></li><li><p><strong>Google launches Gemini 3.1 Flash TTS</strong> with 200+ audio tags for fine-grained voice control, multi-speaker dialogue, and SynthID watermarking across 70+ languages. (<a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-tts/">Google Blog</a>)</p></li></ul><ul><li><p><strong>Starlink customer support is now Grok-powered,</strong> with a voice AI agent handling sales, troubleshooting, and account setup on the phone line. (<a href="https://au.pcmag.com/networking/117104/hi-this-is-ai-starlinks-customer-support-now-features-grok-voice-chatbot">PCMag</a>)</p></li><li><p><strong>Cloudflare adds real-time voice to its Agents SDK,</strong> enabling voice-enabled agents over WebSockets in ~30 lines of server code on Durable Objects. (<a href="https://blog.cloudflare.com/voice-agents/">Cloudflare Blog</a>)</p></li><li><p><strong>DeepL launches voice-to-voice translation</strong> for meetings with Zoom and Teams add-ons, plus a developer API for custom use cases like call centers. (<a href="https://techcrunch.com/2026/04/16/deepl-known-for-text-translation-now-wants-to-translate-your-voice/">TechCrunch</a>)</p></li><li><p><strong>Phonely raises $16M Series A</strong> for AI phone agents that drove $10M+ in insurance policy sales for a single customer this year. (<a href="https://www.axios.com/pro/enterprise-software-deals/2026/04/15/voice-ai-startup-phonely-16-million">Axios</a>)</p></li><li><p><strong>Krisp launches British English accent conversion,</strong> letting offshore agents in India, Philippines, and beyond sound local for UK-facing programs in real time. (<a href="https://cxm.world/customer-experience/krisp-expands-accent-conversion-to-british-english-targeting-uk-facing-offshore-operations/">CXM World</a>)</p></li><li><p><strong>interface.ai launches Nexus,</strong> a fully agentic CCaaS platform for credit unions that eliminates hold queues by keeping AI in the conversation with human backup. (<a href="https://www.globenewswire.com/news-release/2026/04/14/3273785/0/en/interface-ai-Launches-Nexus-The-World-s-First-Fully-Agentic-CCaaS-Platform-That-Ends-the-Era-of-Hold-Queues-Transfers-and-Binary-Call-Routing-for-Credit-Unions-and-Community-Banks.html">GlobeNewswire</a>)</p></li><li><p><strong>ConverseNow partners with Deliverect</strong> to pipe voice AI phone and drive-thru orders into unified restaurant order management across thousands of locations. (<a href="http://www.prnewswire.com/news-releases/conversenow-and-deliverect-announce-partnership-to-bring-voice-ai-ordering-into-unified-restaurant-order--menu-management-302743431.html">PR Newswire</a>)</p></li><li><p><strong>ENCO unveils enSpeak at NAB Show,</strong> adding real-time voice translation to its captioning workflow so viewers can hear live broadcasts in their preferred language. (<a href="https://content-technology.com/nabshow/enco-enspeak-adds-real-time-voice-translation-to-captioning/">Content + Technology</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>NVIDIA releases Audio Flamingo Next (AF-Next),</strong> an open large audio-language model that understands speech, sound, and music with 30-minute context and timestamp-grounded reasoning. (<a href="https://www.marktechpost.com/2026/04/14/nvidia-and-the-university-of-maryland-researchers-released-audio-flamingo-next-af-next-a-super-powerful-and-open-large-audio-language-model/">MarkTechPost</a>)</p></li></ul><ul><li><p><strong>MOSS-TTS-Nano-100M</strong> brings multilingual voice cloning to CPUs with a 100M-param model that streams 48kHz audio in 20 languages. (<a href="https://hackernoon.com/moss-tts-nano-100m-brings-multilingual-voice-cloning-to-cpus">HackerNoon</a>)</p></li><li><p><strong>Hands-on VibeVoice tutorial</strong> covering speaker-aware ASR, real-time TTS, and speech-to-speech pipelines with code. (<a href="https://www.marktechpost.com/2026/04/12/a-hands-on-coding-tutorial-for-microsoft-vibevoice-covering-speaker-aware-asr-real-time-tts-and-speech-to-speech-pipelines/">MarkTechPost</a>)</p></li><li><p><strong>Build a real-time voice agent with Pipecat,</strong> step-by-step guide to streaming STT/TTS pipelines. (<a href="https://hackernoon.com/how-to-build-a-real-time-voice-agent-with-pipecat">HackerNoon</a>)</p></li><li><p><strong>Build an AI medical scribe</strong> using voice agents for clinical documentation. (<a href="https://hackernoon.com/how-to-build-an-ai-medical-scribe-with-voice-agents">HackerNoon</a>)</p></li><li><p><strong>Diction:</strong> self-hosted STT setup guide as an open alternative to Wispr Flow. (<a href="https://dev.to/omachala/how-to-set-up-diction-the-self-hosted-speech-to-text-alternative-to-wispr-flow-20km">dev.to</a>)</p></li><li><p><strong>Deepgram and Modulate benchmarked</strong> against real-world audio conditions. (<a href="https://hackernoon.com/how-deepgram-and-modulate-benchmark-against-real-world-audio">HackerNoon</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Week Voice AI Went Local]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/the-week-voice-ai-went-local</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/the-week-voice-ai-went-local</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 13 Apr 2026 14:03:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/05828bca-7217-477c-a39f-e09b3205d8c9_1292x1042.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>Krisp brings Accent Conversion to YouTube </strong>with free Chrome Extension for 2.7B users (<a href="https://www.linkedin.com/posts/artominasyan_another-breakthrough-from-krisp-in-voice-ugcPost-7449434256495321088-pF4v/?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAAQVi7YBoesfpCTzc8u0PuwKzzV1-wDM7jw">LinkedIn</a>)</p></li><li><p><strong>Google quietly ships AI Edge Eloquent,</strong> a free offline-first dictation app for iOS running on-device Gemma models with filler removal and no subscription. (<a href="https://techcrunch.com/2026/04/07/google-quietly-releases-an-offline-first-ai-dictation-app-on-ios/">TechCrunch</a>)</p></li></ul><ul><li><p><strong>Mistral launches Voxtral TTS,</strong> a 4B open-weights streaming speech model in 9 languages that beats ElevenLabs Flash v2.5 in voice cloning win rates. (<a href="https://slator.com/mistral-text-to-speech-model/">Slator</a>)</p></li><li><p><strong>ByteDance introduces Seeduplex,</strong> a native full-duplex speech LLM that listens while speaking and cuts false interruption rates in half vs half-duplex Doubao. (<a href="https://seed.bytedance.com/en/blog/introducing-seed-full-duplex-speech-llm-attentive-listening-robust-interference-suppression-enabling-more-natural-interaction">ByteDance Seed</a>)</p></li><li><p><strong>Willow launches Atlas-1,</strong> a new frontier STT model built on human-powered transcription infrastructure that claims to beat ElevenLabs, Deepgram, and OpenAI. (<a href="https://www.vp-land.com/p/willow-launches-atlas-1-claims-a-new-standard-for-speech-to-text-accuracy">VP-Land</a>)</p></li><li><p><strong>Telnyx launches LiveKit on Telnyx,</strong> a hosted platform running LiveKit agents on Telnyx infrastructure with 50% lower cost and sub-200ms latency. (<a href="https://telecomreseller.com/2026/04/10/telnyx-launches-livekit-on-telnyx-for-deploying-voice-ai-agents-with-lower-cost-and-ultra-low-latency/">Telecom Reseller</a>)</p></li><li><p><strong>Natter raises $23M Series A</strong> led by Renegade Partners to replace enterprise surveys with AI-moderated 1:1 video conversations at scale. (<a href="https://ventureburn.com/natter-raises-23m/">VentureBurn</a>)</p></li><li><p><strong>Twilio Q4 voice AI revenue grew 60%</strong> as the company closed its biggest enterprise deal ever and repositioned as AI infrastructure. (<a href="https://www.cxtoday.com/contact-center/twilio-q4-2025-earnings-voice-ai-enterprise/">CX Today</a>)</p></li><li><p><strong>Regal AI launches Copilot,</strong> a self-improving voice agent builder that learns from call outcomes and flags underperformance automatically. (<a href="https://siliconangle.com/2026/04/08/regal-ai-launches-copilot-building-self-improving-voice-ai-agents/">SiliconANGLE</a>)</p></li><li><p><strong>Exotel acqui-hires Dubverse core team</strong> to lead conversation quality analytics and AI, deepening its voice AI stack for Indian enterprises. (<a href="https://www.techcircle.in/2026/04/08/exotel-acqui-hires-dubverse-core-team-to-boost-voice-ai-for-enterprises/">TechCircle</a>)</p></li><li><p><strong>Californians sue Sutter and MemorialCare</strong> over use of Abridge AI scribe that allegedly recorded doctor-patient visits without clear patient consent. (<a href="https://arstechnica.com/tech-policy/2026/04/californians-sue-over-ai-tool-that-records-doctor-visits/">Ars Technica</a>)</p></li><li><p><strong>Five9 expands Fusion ecosystem with AI Agent Connect API,</strong> letting enterprises wire voice AI agents into third-party systems and Assembled WFM. (<a href="https://finance.yahoo.com/markets/stocks/articles/five9-fivn-expanding-agentic-ai-170642759.html">Yahoo Finance</a>)</p></li><li><p><strong>Weya AI open-sources Hush,</strong> an 8MB speech enhancement model with 1.8M params that isolates the primary speaker in under 1ms per frame, CPU-only. (<a href="https://www.indianweb2.com/2026/04/weya-ai-launches-hush-lightweight-open.html">IndianWeb2</a>)</p></li><li><p><strong>Shunya Labs launches voice AI platform</strong> for dubbing, translation, lip-sync, and low-shot voice cloning for entertainment localization. (<a href="https://www.passionateinmarketing.com/shunya-labs-launches-end-to-end-voice-ai-platform-for-dubbing-translation-and-multilingual-content-localisation/">Passionate in Marketing</a>)</p></li><li><p><strong>Beaver AI launches Magic Whiteboard,</strong> a privacy-first meeting assistant that transcribes in real time but never records or stores audio. (<a href="https://www.prweb.com/releases/the-first-ai-meeting-platform-that-never-records-you---beaver-ai-launches-magic-whiteboard-302734394.html">PRWeb</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>AWS on Nova Multimodal Embeddings</strong> for semantic audio search across tone, emotion, and events, unified with text/image/video in a single vector space. (<a href="https://aws.amazon.com/blogs/machine-learning/building-intelligent-audio-search-with-amazon-nova-embeddings-a-deep-dive-into-semantic-audio-understanding/">AWS Blog</a>)</p></li></ul><ul><li><p><strong>Voxtral TTS surgery:</strong> deep-dive into reconstructing codec audio from intermediate model states. (<a href="https://towardsdatascience.com/voxtral-tts-surgery-codes-from-audio-reconstruction-2/">Towards Data Science</a>)</p></li><li><p><strong>Kokoro 82M TTS</strong> runs fully offline on CPU with 8 languages and 26 voices in a ~350MB footprint. (<a href="https://www.geeky-gadgets.com/local-offline-tts-kokoro/">Geeky Gadgets</a>)</p></li><li><p><strong>docker-whisper:</strong> self-hosted Whisper ASR in a container for easy local deployment. (<a href="https://github.com/hwdsl2/docker-whisper">GitHub</a>)</p></li><li><p><strong>Browser-based STT with Whisper:</strong> tutorial on running Whisper inference entirely in the browser. (<a href="https://dev.to/linmingren/building-a-browser-based-speech-to-text-system-with-whisper-ai-23e5">dev.to</a>)</p></li><li><p><strong>Lightweight offline TTS for Node.js</strong> using a minimal dependency chain. (<a href="https://dev.to/pavkode/lightweight-offline-text-to-speech-solution-for-nodejs-applications-4n68">dev.to</a>)</p></li><li><p><strong>Designing a real-time voice agent</strong> with RAG, SIP, and compliance guardrails. (<a href="https://hackernoon.com/designing-a-real-time-ai-voice-agent-with-rag-sip-integration-and-compliance-guardrails">HackerNoon</a>)</p></li><li><p><strong>Open-source Amazon Lex connector for Cisco Webex Contact Center</strong> for adding virtual agents without a platform rebuild. (<a href="https://aws.amazon.com/blogs/apn/cisco-conversational-ai-powered-by-amazon-lex/">AWS APN Blog</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Microsoft Enters the Voice AI Race]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/microsoft-enters-the-voice-ai-race</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/microsoft-enters-the-voice-ai-race</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 06 Apr 2026 14:03:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9fa20f24-6261-4c1c-a5a2-644f8de51ddd_1902x946.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>Microsoft launches MAI-Transcribe-1 and MAI-Voice-1</strong> - Two new in-house models: a batch transcription model (top 25 languages, 2.5x faster than Azure Fast) and a voice generation model that produces 60s of audio in 1s. Available now in Foundry. (<a href="https://venturebeat.com/technology/microsoft-launches-3-new-ai-models-in-direct-shot-at-openai-and-google">VentureBeat</a>) (<a href="https://microsoft.ai/news/today-were-announcing-3-new-world-class-mai-models-available-in-foundry/">Microsoft AI</a>)</p></li></ul><ul><li><p><strong>Microsoft open-sources VibeVoice</strong> - Family of TTS and ASR models under MIT license. TTS handles up to 90 minutes with 4 speakers. ASR transcribes 60-minute audio in a single pass with speaker diarization. Already at 27K GitHub stars. (<a href="https://github.com/microsoft/VibeVoice">GitHub</a>)</p></li><li><p><strong>Alibaba releases Qwen3.5-Omni</strong> - Native multimodal model processing text, audio, video in one pipeline. Speech recognition in 113 languages, generation in 36. Built-in turn-taking recognition that distinguishes backchanneling from real interruptions. Closed source, API only. (<a href="https://www.marktechpost.com/2026/03/30/alibaba-qwen-team-releases-qwen3-5-omni-a-native-multimodal-model-for-text-audio-video-and-realtime-interaction/">MarkTechPost</a>)</p></li><li><p><strong>Modulate launches Velma Deepfake Detect</strong> - Synthetic voice detection API ranked #1 on the HuggingFace Deepfake Speech leaderboard. Claims 578x lower cost than the next-best model, making always-on call monitoring viable. (<a href="https://gamesbeat.com/modulatess-velma-deepfake-detect-focuses-on-synthetic-voice-detection/">GamesBeat</a>)</p></li><li><p><strong>CNTXT AI launches Munsit</strong> - Arabic voice AI platform combining ASR and TTS across 25+ dialects. Already processing over a million minutes of audio for 250+ government and enterprise orgs in the UAE. (<a href="https://www.zawya.com/en/press-release/companies-news/cntxt-ai-launches-munsit-the-worlds-most-accurate-arabic-voice-ai-as-demand-for-ai-services-accelerates-across-the-uae-fw5z241m">Zawya</a>)</p></li><li><p><strong>Retell AI makes Wing VC Enterprise Tech 30</strong> - Voice AI agent platform hit $50M ARR and powers 50M+ real-time AI phone calls per month. One of three voice AI companies on the list. (<a href="https://www.globenewswire.com/news-release/2026/04/03/3268014/0/en/Voice-AI-Startup-Retell-AI-Named-to-Wing-VC-Enterprise-Tech-30-2026-List-Celebrating-the-Best-of-Enterprise-Tech.html">GlobeNewswire</a>)</p></li><li><p><strong>Speechify launches Windows app with on-device models</strong> - Local Whisper-based transcription and neural TTS on Copilot+ PCs and GPUs. No cloud needed. Competing with Wispr Flow and Superwhisper. (<a href="https://techcrunch.com/2026/03/31/speechifys-windows-app-uses-local-models-for-transcription-and-dictation/">TechCrunch</a>)</p></li><li><p><strong>The hidden cost of agentic AI callers</strong> - Some B2B contact centers seeing 15-20% of inbound volume from AI agents at peak. They wait forever, consume resources, and extract operational data. Detection is key. (<a href="https://www.symnexconsulting.com/blog/hidden-cost-of-agentic-ai-callers">SymNex</a>)</p></li><li><p><strong>AudioShake ships real-time audio separation SDK</strong> - Source separation for iOS, Android, Windows, Linux. Ranked #1 in Meta&#8217;s SAM audio benchmarks. Used by Warner, Universal, Sony, Disney. Now available for edge deployment. (<a href="https://slator.com/ai-audio-separation-audioshake/">Slator</a>)</p></li><li><p><strong>AI voice scams surge with 3-second cloning</strong> - Scammers cloning family members&#8217; voices from short social media clips. BBB and FTC warnings. AI-generated voice fraud up 1,200% in 2025. (<a href="https://www.moneycontrol.com/news/business/personal-finance/think-it-s-your-family-calling-why-ai-voice-scams-are-getting-harder-to-spot-13878976.html">MoneyControl</a>)</p></li><li><p><strong>MiraVoice raises $6.3M</strong> - AI voice agent for long-form phone surveys (120+ questions, 40+ min). Seed round led by Unusual Ventures. (<a href="https://news.crunchbase.com/venture/ai-interviewer-miravoice-raises-seed-funding-unusual/">Crunchbase</a>)</p></li><li><p><strong>Gnani.ai raises $10M Series B</strong> - India&#8217;s leading voice AI platform, 30M+ voice interactions daily in 12+ languages. Also launched Inya VoiceOS, a 5B-parameter voice-to-voice model. (<a href="https://www.businesstoday.in/technology/story/gnaniai-raises-10-million-in-funding-from-aavishkaar-capital-to-scale-global-voice-ai-push-523218-2026-03-31">BusinessToday</a>)</p></li><li><p><strong>Insight Health raises $11M Series A</strong> - Voice and chat AI agents for clinical admin: patient screening, referral processing, EHR documentation. Integrated with athenahealth. (<a href="https://www.mobihealthnews.com/news/insight-health-raises-11m-scale-clinical-ai-agents">MobiHealthNews</a>)</p></li><li><p><strong>Google Gboard adds Bluetooth mic for voice typing</strong> - Finally lets you dictate through connected earbuds instead of phone mic. Rolling out via server-side update. (<a href="https://www.androidauthority.com/gboard-voice-typing-bluetooth-earbuds-3652971/">Android Authority</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>Orpheus-FastAPI</strong> - Self-hosted TTS server with OpenAI-compatible API. 8 voices, emotion tags, long-form batching. Connects to llama.cpp, LM Studio, GPUStack. Apache 2.0. (<a href="https://github.com/Lex-au/Orpheus-FastAPI">GitHub</a>)</p></li><li><p><strong>MeloTTS</strong> - Multi-lingual TTS library by MyShell.ai. English (4 accents), Spanish, French, Chinese, Japanese, Korean. Runs in real time on CPU. MIT license. (<a href="https://github.com/myshell-ai/MeloTTS">GitHub</a>)</p></li><li><p><strong>Build a voice-enabled AI agent in n8n</strong> - Step-by-step tutorial for wiring up voice input/output in n8n workflows. (<a href="https://dev.to/kfuras/build-a-voice-enabled-ai-agent-in-n8n-3oke">dev.to</a>)</p></li><li><p><strong>How to choose the best STT API for voice agents</strong> - Comparison of latency, accuracy, and cost tradeoffs across providers. (<a href="https://hackernoon.com/how-to-choose-the-best-speech-to-text-api-for-voice-agents">HackerNoon</a>)</p></li><li><p><strong>The hidden audio bias in audio-visual speech recognition</strong> - Analysis of how AV-ASR models over-rely on audio, undermining the visual modality. (<a href="https://hackernoon.com/the-hidden-audio-bias-inside-audio-visual-speech-recognition">HackerNoon</a>)</p></li><li><p><strong>Why speech recognition APIs need a different architecture</strong> - Smallest AI on designing ASR for real-time voice agent use cases vs batch transcription. (<a href="https://dev.to/smallestai-community/why-speech-recognition-api-requires-a-different-architecture-46ed">dev.to</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Krisp Is Nominated for 3 Webby Awards]]></title><description><![CDATA[Your vote decides who wins.]]></description><link>https://voice-ai-newsletter.krisp.ai/p/krisp-is-nominated-for-3-webby-awards</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/krisp-is-nominated-for-3-webby-awards</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Thu, 02 Apr 2026 13:54:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/49535f6e-5c33-47eb-bad8-55b20a62bb00_1000x700.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ou3b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ou3b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ou3b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ou3b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ou3b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ou3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg" width="1200" height="300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86772,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://voice-ai-newsletter.krisp.ai/i/192785886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ou3b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ou3b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ou3b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ou3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2585b1fa-9f16-4b73-9cd2-7267d4fe0f9a_1200x300.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Krisp has been nominated for three 2026 Webby Awards for Technical Achievement, Developer Tools &amp; APIs, and Best Use of AI Voice &amp; Conversational Interface.</p><p>The Webby Awards are one of the most recognized honors in digital technology. Getting nominated in three categories, all tied to voice AI, is a meaningful signal of where this space is headed and the work the team has put in to get us here.</p><p>The People&#8217;s Voice Award is decided solely by public vote. </p><p><strong>If you follow this newsletter, you already believe in what we&#8217;re building, and we'd love your support.</strong></p><h3><strong>Click each link below to vote:</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://vote.webbyawards.com/PublicVoting#/2026/apps-software-immersive/app-excellence/technical-achievement&quot;,&quot;text&quot;:&quot;Technical Achievement&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://vote.webbyawards.com/PublicVoting#/2026/apps-software-immersive/app-excellence/technical-achievement"><span>Technical Achievement</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://vote.webbyawards.com/PublicVoting#/2026/apps-software-immersive/business-software-services/developer-tools-apis&quot;,&quot;text&quot;:&quot;Developer Tools &amp; APIs&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://vote.webbyawards.com/PublicVoting#/2026/apps-software-immersive/business-software-services/developer-tools-apis"><span>Developer Tools &amp; APIs</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://wbby.co/58853N&quot;,&quot;text&quot;:&quot;AI Voice &amp; Conversational Interface&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://wbby.co/58853N"><span>AI Voice &amp; Conversational Interface</span></a></p><p><strong>Or type &#8220;Krisp&#8221; into the category search bar and our nominations will surface for one-click voting.  </strong></p><p>You can cast one vote per category, closes April 16.</p><p>Thank you, and more soon.</p><p>&#8212; Davit</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Voice AI Newsletter! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[3 New Open-Source Voice Models Drop in One Week]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/3-new-open-source-voice-models-drop</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/3-new-open-source-voice-models-drop</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 30 Mar 2026 14:03:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c7a3c93a-0840-497a-84dd-fb57fc8122d8_1296x872.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p><strong>Mistral launches Voxtral TTS</strong> - Open-weight 4B TTS model. 9 languages, 90ms TTFA, 6x RTF. Runs on consumer GPUs. Mistral claims it beats ElevenLabs on quality benchmarks. (<a href="https://techcrunch.com/2026/03/26/mistral-releases-a-new-open-source-model-for-speech-generation/">TechCrunch</a>) (<a href="https://mistral.ai/news/voxtral-tts">Mistral blog</a>)</p></li></ul><ul><li><p><strong>Cohere releases Transcribe</strong> - Open-source 2B ASR model built for edge. 14 languages, 5.42 avg WER on HF Open ASR leaderboard, beating Zoom Scribe v1, IBM Granite 4.0, ElevenLabs Scribe v2, and Qwen3-ASR. Free via API and HuggingFace. (<a href="https://techcrunch.com/2026/03/26/cohere-launches-an-open-source-voice-model-specifically-for-transcription/">TechCrunch</a>) (<a href="https://cohere.com/blog/transcribe">Cohere blog</a>)</p></li><li><p><strong>Google ships Gemini 3.1 Flash Live + Search Live goes global</strong> - Real-time voice/video model with native function calling. 90.8% on ComplexFuncBench Audio (~20% jump over prev gen). Now powers Search Live in 200+ countries with voice and camera input. (<a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live/">Google blog</a>) (<a href="https://techcrunch.com/2026/03/26/google-is-launching-search-live-globally/">TechCrunch</a>)</p></li><li><p><strong>Smallest AI launches Lightning V3</strong> - 3.89 MOS in conversational evals, claims to beat OpenAI, Cartesia, and ElevenLabs. 15 languages with auto-detection and mid-sentence switching. Voice cloning from 5-15s of audio. (<a href="https://smallest.ai/blog/introducing-lightning-v3">Smallest.ai blog</a>)</p></li><li><p><strong>Amazon Polly adds Bidirectional Streaming</strong> - Stream text to Polly token-by-token as your LLM generates it, get audio back in real time over HTTP/2. 39% faster than batch approach, collapses 27 API calls to 1 on a 970-word passage. GA now. (<a href="https://aws.amazon.com/blogs/machine-learning/introducing-amazon-polly-bidirectional-streaming-real-time-speech-synthesis-for-conversational-ai/">AWS blog</a>)</p></li><li><p><strong>AWS adds WebRTC to Bedrock AgentCore</strong> - Pipecat voice agents now run on AgentCore Runtime with bidirectional WebSocket and WebRTC. Supports barge-in. Ready-to-deploy examples with Pipecat, Nova Sonic, LiveKit, and Strands SDK. (<a href="https://aws.amazon.com/blogs/machine-learning/deploy-voice-agents-with-pipecat-and-amazon-bedrock-agentcore-runtime-part-1/">AWS blog</a>)</p></li><li><p><strong>Genesys reports record Q4</strong> - Genesys Cloud at ~$2.6B ARR, 35%+ YoY growth. 70%+ of customers now on AI. AI-powered conversations up 120% YoY. AI is 20% of new ACV, with 10+ deals where AI exceeded half the contract value. (<a href="https://www.genesys.com/company/newsroom/announcements/genesys-reports-record-fourth-quarter-as-organizations-accelerate-the-adoption-of-ai-powered-experience-orchestration">Genesys</a>)</p></li><li><p><strong>Artificial Analysis updates voice benchmarks</strong> - AA-WER v2.0 adds conversational AI, EU Parliament speech, and financial call datasets. ElevenLabs Scribe v2 leads at 2.3% WER. Best value: Mistral Voxtral Small at 3.0% WER / $4 per 1K min. TTS Arena: Inworld TTS-1.5-Max at #1, ELO 1,160. (<a href="https://x.com/ArtificialAnlys/status/2037195442489090485?s=20">X post</a>)</p></li><li><p><strong>AI chatbots handle 60%+ of banking support</strong> - BofA Erica: 1.5B+ interactions, 98% resolved without human. Klarna AI: 66% of inquiries, saving $40M/yr. Gartner projects $80B in contact center labor cost cuts in 2026. (<a href="https://techbullion.com/why-ai-chatbots-are-handling-over-60-of-banking-customer-support/">TechBullion</a>)</p></li><li><p><strong>The economics of AI vs human agents</strong> - Voice AI now costs ~$0.40/call vs $7-12 for a human agent: 90-95% cost reduction per interaction. Analysis of how this is reshaping contact center staffing. (<a href="https://medium.datadriveninvestor.com/the-silence-of-the-call-center-openai-just-cut-40-of-call-center-jobs-in-one-week-df82cc10fc61">Medium</a>)</p></li><li><p><strong>Agentic Voice AI goes mainstream</strong> - 1 in 10 customer service interactions projected to be fully automated by agentic voice AI in 2026. 80% of businesses plan to deploy. RingCentral shipped AIR Pro, an agentic voice platform embedded in its comms stack. (<a href="https://telecomreseller.com/2026/03/24/agentic-voice-ai-for-business/">Telecom Reseller</a>)</p></li><li><p><strong>Salesforce Agentforce Contact Center</strong> - Native CCaaS unifying voice, digital channels, CRM, and AI agents in one stack. Voice now built into the CRM on Hyperforce. GA since Feb 23. (<a href="https://cloudwars.com/ai/salesforce-agentforce-contact-center-brings-unified-data-and-ai-agents-to-customer-service/">Cloud Wars</a>)</p></li><li><p><strong>Otter.ai hits 35M users, $100M ARR</strong> - Sam Liang interview. $100M ARR with &lt;200 employees ($500K+ rev/employee). #14 on Forbes 2026 Best Startup Employers. Liang: 2026 is &#8220;the year of the voice.&#8221; (<a href="https://youtu.be/7yMetPnsFT0">YouTube</a>)</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p><strong>Gladia open-sources WER normalization library</strong> - Normalizes transcripts before computing WER to eliminate false penalties from formatting differences (&#8221;$50&#8221; vs &#8220;fifty dollars&#8221;). Configurable YAML pipelines for fair cross-engine ASR comparison. (<a href="https://github.com/gladiaio/normalization">GitHub</a>) (<a href="https://www.linkedin.com/posts/gevorg-minasyan-42475a132_tts-normalization-comparisons-activity-7442931104967319552-Baog/">LinkedIn - Gevorg Minasyan</a>)</p></li></ul><ul><li><p><strong>MacWhisper</strong> - Mac-native local transcription using Whisper and Nvidia Parakeet. 300K copies sold. Batch processing, YouTube transcription, auto-recording Zoom/Teams/Webex. All on-device. (<a href="https://www.trendhunter.com/amp/trends/macwhisper">Trend Hunter</a>)</p></li><li><p><strong>Logan Kilpatrick on Gemini 3 Flash</strong> - Google DeepMind&#8217;s Logan Kilpatrick discusses the latest Gemini model capabilities. (<a href="https://x.com/OfficialLoganK/status/2037187750005240307?s=20">X post</a>)</p></li><li><p><strong>Google Docs adds Gemini-powered audio proofreading</strong> - &#8220;Listen to this&#8221; reads docs aloud with AI voices. 0.5x-2x playback. Also ships audio summaries: condenses long docs into ~3min podcast-style recaps. Desktop, English only for now. (<a href="https://www.makeuseof.com/google-docs-hidden-audio-feature-proofread/">MakeUseOf</a>)</p></li><li><p><strong>Rekam AI</strong> - All-in-one voice platform: TTS, STT, voice cloning, custom voice creation. 2,000+ voices, 20+ languages. Free unlimited tier for Kokoro models. (<a href="https://dynamicbusiness.com/ai-tools/rekam-ai-ai-voice-platform-overview.html">Dynamic Business</a>)</p></li><li><p><strong>Klassifier</strong> - AI-powered audio classification tool. (<a href="https://www.trendhunter.com/trends/klassifier">Trend Hunter</a>)</p></li><li><p><strong>ViciStack on call center AI voice agents</strong> - Overview of real-time conversation handling, reduced wait times, and automated workflows in production contact centers. (<a href="https://vicistack.com/blog/call-center-ai-voice-agents">ViciStack</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Scaling STT systems | Maxime Gaudin (CTO at Gladia)]]></title><description><![CDATA[Watch now | In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years?]]></description><link>https://voice-ai-newsletter.krisp.ai/p/scaling-stt-systems-maxime-gaudin</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/scaling-stt-systems-maxime-gaudin</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Thu, 26 Mar 2026 13:10:44 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/191781305/5ff3cfd6530c40911705194eb9bac5f9.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<pre><code><code>In the Future of Voice AI series of interviews, I ask three questions to my guests:

- What problems do you currently see in Enterprise Voice AI?
- How does your company solve these problems?
- What solutions do you envision in the next 5 years?</code></code></pre><p>This episode&#8217;s guest is <a href="https://www.linkedin.com/in/maxime-gaudin/">Maxime Gaudin</a>, CTO at <a href="https://www.gladia.io/">Gladia</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M_NR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M_NR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!M_NR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!M_NR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!M_NR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M_NR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:512303,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://voice-ai-newsletter.krisp.ai/i/191781305?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M_NR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!M_NR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!M_NR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!M_NR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a26818e-78f7-4be0-86b5-1468d7f09021_1200x1200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Former Co-Founder &amp; CTO at Matcha and CTO at MadKudu through its private equity acquisition. Among the first employees at Malt, where he helped scale the company from 6 to 250 people and over &#8364;12M in monthly transaction volume. He earned his Master's degree in Computer Science from INSA Lyon and Polytechnique Montr&#233;al, Canada. Throughout his career, he has built and scaled products across B2B SaaS, data intelligence, and speech AI, from early-stage founding to leading engineering organizations through hypergrowth and acquisitions.</p><p><a href="http://www.gladia.io">Gladia</a> was founded in 2022 by Jean-Louis Queguiner and Jonathan Soto with a mission to help companies leverage cutting-edge AI and retrieve actionable insights from audio data. Its API supports advanced speech recognition features in over 100 languages, with exceptional accuracy and asynchronous and real-time transcription.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.youtube.com/@futureofvoiceai&quot;,&quot;text&quot;:&quot;Listen on YouTube&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.youtube.com/@futureofvoiceai"><span>Listen on YouTube</span></a></p><h3><strong>Recap Video</strong></h3><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;9bea8880-52b7-4f24-a617-2bc892ef35a1&quot;,&quot;duration&quot;:null}"></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>Takeaways</strong></h3><ul><li><p>Winning isn&#8217;t just about model quality, it is surviving brutal tradeoffs between latency, cost, and scale.</p></li><li><p>The real challenge is not training one great model, it is running it cheap enough to meet market pricing without breaking performance.</p></li><li><p>STT is getting commoditized so fast that providers have to chase better accuracy while selling at margins that keep shrinking.</p></li><li><p>Big models don&#8217;t matter if they are too expensive to run at scale.</p></li><li><p>Real-time voice AI lives or dies under a hard latency budget, and staying under 300 milliseconds leaves little room for mistakes.</p></li><li><p>The industry obsession with one model that does everything may be the wrong path if smaller specialist models can outperform it in the moments that matter.</p></li><li><p>Every model upgrade is risky because improving one language or task can make another one worse.</p></li><li><p>Testing speech systems is harder than people admit because teams know something broke, but don&#8217;t know what.</p></li><li><p>General transcription errors can be patched by an LLM, but once a name, phone number, email, or address is lost, it is gone.</p></li><li><p>The next edge in voice AI may come from tiny models trained for high-value details like PII, not from one giant model trying to handle everything.</p></li><li><p>Email addresses sound simple until real accents, pauses, corrections, and spelling cues expose how messy spoken language really is.</p></li><li><p>The companies that win enterprise voice AI will be the ones that orchestrate many narrow models well, not the ones chasing a single universal model.</p></li><li><p>Infrastructure strategy is becoming a product decision because legal rules, traffic spikes, and customer use cases all change what &#8220;best&#8221; deployment looks like.</p></li><li><p>Cloud scaling breaks in real-time spikes, like emergency calls.</p></li><li><p>Using managed infra and large DevOps teams at once wastes money.</p></li><li><p>Customers want one vendor for everything, even if quality drops.</p></li><li><p>The market will reward depth over breadth if a vendor can become truly exceptional in one painful, business-critical part of the voice stack.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Scale AI launches real-world voice AI benchmark]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/scale-ai-launches-real-world-voice</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/scale-ai-launches-real-world-voice</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 23 Mar 2026 14:02:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e054f56f-17b9-4ab0-9d17-c2b2b5aa237b_1880x1022.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p>Scale AI launches the first real-world voice AI benchmark (<a href="https://venturebeat.com/data/scale-ai-launches-voice-showdown-the-first-real-world-benchmark-for-voice-ai">VentureBeat</a>)</p></li><li><p>NVIDIA has released Nemotron 3 VoiceChat speech to speech model (<a href="https://x.com/ArtificialAnlys/status/2033642073052868861?s=20">X</a>)</p></li><li><p>Krisp launches MCP integration with Claud (<a href="https://www.linkedin.com/feed/update/urn:li:activity:7440383581539033088/">LinkedIn</a>)</p></li><li><p>Amazon Connect voice AI agents now supports 13 new languages (<a href="https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-connect-voice-ai-agents-13-languages/">AWS</a>)</p></li><li><p>Modulate launches Velma Transcribe: High-performance transcription for real-world conversations at 90% lower cost (<a href="https://www.enterprisenews.com/press-release/story/89859/modulate-launches-velma-transcribe-high-performance-transcription-for-real-world-conversations-at-90-lower-cost/">Enterprise News</a>)</p></li><li><p>Google News could soon give you a convenient new way to consume its audio briefings (<a href="https://www.androidauthority.com/google-news-read-ai-audio-briefings-transcript-apk-teardown-3649402/">Android Authority</a>)</p></li><li><p>AI notetaking devices that record and transcribe your meetings (<a href="https://techcrunch.com/2026/03/20/ai-notetaker-hardware-devices-pins-pendants-record-transcribe/">TechCrunch</a>)</p></li><li><p>Krisp has been named a Palomarr Leader across Accent Conversion, Noise Cancellation, Voice Translation (<a href="https://www.linkedin.com/feed/update/urn:li:activity:7439664507603357696/">LinkedIn</a>)</p></li><li><p>Amazon Connect adds new generative TTS voices and expands regions (<a href="https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-connect-adds-generative-text-to-speech-voices/">AWS</a>)</p></li><li><p>Ringover launches enhanced AI assistant ask Empower 2.0 (<a href="https://aithority.com/machine-learning/ringover-launches-enhanced-ai-assistant-ask-empower-2-0/">AIThority</a>)</p></li><li><p>WhatsApp upgrade &#8212; calls will sound completely different (<a href="https://nokiapoweruser.com/whatsapp-just-got-a-game-changing-upgrade-calls-will-sound-completely-different/">Nokia Power User</a>)</p></li><li><p>8x8 Engage launches globally for frontline teams (<a href="https://www.cmswire.com/customer-experience/8x8-engage-launches-globally-for-frontline-teams/">CMSWire</a>)</p></li><li><p>Itel unveils Zeno AI Weaver voice recorder in India (<a href="https://www.gadgets360.com/ai/news/itel-zeno-ai-weaver-voice-recorder-price-in-india-unveil-specifications-features-11233496">Gadgets360</a>) </p></li><li><p>AI voice cloning &amp; synthesis are shaping the future of digital voices (<a href="https://www.techtimes.com/articles/315169/20260317/ai-voice-cloning-voice-synthesis-technology-are-shaping-future-digital-voices.htm">TechTimes</a>)</p></li><li><p>How businesses are replacing IVR with conversational AI (<a href="https://socialmediaexplorer.com/business-innovation-2/ai-voice-agents-in-2026-how-businesses-are-replacing-ivr-with-conversational-ai-that-actually-works/">Social Media Explorer</a>)</p></li><li><p>Bandicam launches AI feature to transcribe video to text on Mac (<a href="https://martechseries.com/video/bandicam-launches-ai-feature-to-transcribe-video-to-text-on-mac/">MarTech Series</a>)</p></li><li><p>The mounting cost of voice fraud: revenue loss, broken trust (<a href="https://www.retaildive.com/spons/the-mounting-cost-of-voice-fraud-revenue-loss-broken-trust-and-operationa/814409/">Retail Dive</a>)</p></li><li><p>Robinhood&#8217;s startup fund invests $35M in Stripe and AI audio firm (<a href="https://www.theblock.co/amp/post/393910/robinhoods-startup-fund-invests-roughly-35-million-across-stripe-and-ai-audio-firm">The Block</a>)</p></li><li><p>Ezra raises $3.2M in seed funding (<a href="https://www.finsmes.com/2026/03/ezra-raises-3-2m-in-seed-funding.html">FinSMEs</a>)</p></li><li><p>WellSaid closes venture debt funding (<a href="https://www.finsmes.com/2026/03/wellsaid-closes-venture-debt-funding.html">FinSMEs</a>)</p></li></ul><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p>VoXtream2: Full-stream TTS with dynamic speaking-rate control (<a href="https://www.linkedin.com/posts/ntorgashov_tts-texttospeech-streaming-activity-7439675674191147008-BOlV/">LinkedIn</a>)</p></li><li><p>Adaptive AI voice layer for real-time communication (<a href="https://dev.to/peacebinflow/adaptive-ai-voice-layerfor-real-time-communication-32gf">Dev</a>)</p></li><li><p>Utterly: Transcribe speech privately on Apple devices, offline (<a href="https://betalist.com/startups/utterly">BetaList</a>)</p></li><li><p>MiniMax 2.7: GLM-5 at 1/3 cost SOTA open model  (<a href="https://news.smol.ai/issues/26-03-18-not-much/">Smol AI News</a>)</p></li><li><p>Best STT APIs to build an AI notetaker in 2026 (<a href="https://hackernoon.com/best-speech-to-text-apis-to-build-an-ai-notetaker-in-2026">Hacker Noon</a>)</p></li><li><p>PersonaOps: A voice-to-data intelligence system powered by Notion MCP (<a href="https://dev.to/peacebinflow/personaopsa-voice-to-data-intelligence-systempowered-by-notion-mcp-4m2">Dev</a>)</p></li><li><p>Google AI releases WAXAL: Multilingual African speech dataset (<a href="https://www.marktechpost.com/2026/03/17/google-ai-releases-waxal-a-multilingual-african-speech-dataset-for-training-automatic-speech-recognition-and-text-to-speech-models/">MarktechPost</a>)</p></li><li><p>WhisperWeb processed STT Directly within the browser (<a href="https://www.trendhunter.com/trends/local-ai-transcription">Trend Hunter</a>)</p></li><li><p>Why building voice AI agents is still so hard (<a href="https://dev.to/dograh/why-building-voice-ai-agents-is-still-so-hard-and-why-we-started-dograh-2gcc">Dev</a>)</p></li><li><p>OpenVoiceUI: AI voice agent app generates live canvas pages (<a href="https://dev.to/mcerqua/openvoiceui-ai-voice-agent-app-generates-live-canvas-pages-using-openclaw-33i9">Dev</a>)</p></li><li><p>Vietnamese automatic speech recognition (<a href="https://tldr.takara.ai/p/2603.14779">TLDR Takara</a>)</p></li><li><p>VoiceType AI transcribes, edits, and auto-formats your speech (<a href="https://www.trendhunter.com/trends/voicetype-ai">Trend Hunter</a>)</p></li><li><p>Speech synthesis API for TTS (<a href="https://dev.to/omriluz1/speech-synthesis-api-for-text-to-speech-1b1c">Dev</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Updates from Salesforce, RingCentral, MS and others!]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/updates-from-salesforce-ringcentral</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/updates-from-salesforce-ringcentral</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 16 Mar 2026 14:01:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c85a9eb3-d05d-4cb0-a5f2-61faf1f2e0ba_1280x720.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p>Salesforce launches Agentforce Contact Centre (<a href="https://cxm.world/customer-experience/salesforce-launches-agentforce-contact-centre-ai-voice-crm-finally-unified-in-one-ccaas-platform/">CXM World</a>)</p></li><li><p>RingCentral unveils AIR Pro at Enterprise Connect (<a href="https://www.cxtoday.com/contact-center/enterprise-connect-ringcentral-air-pro-voice-ai/">CX Today</a>)</p></li><li><p>Microsoft announces Custom Voice for Dynamics 365 Contact Center (<a href="https://www.microsoft.com/en-us/dynamics-365/blog/it-professional/2026/03/06/custom-neural-voices-dynamics-365-contact-center/">Microsoft</a>)</p></li><li><p>Intron launches voice AI supporting 57 African languages (<a href="https://kenyanwallstreet.com/intron-launches-new-voice-ai-service-sahara-v2">Kenyan Wallstreet</a>)</p></li><li><p>Krisp launches customer accent conversion for global contact centers (<a href="https://www.cxtoday.com/contact-center/krisp-customer-accent-conversion-contact-centers/">CX Today</a>)</p></li><li><p>Voice and language intelligence market size in 2026 (<a href="https://www.precedenceresearch.com/voice-and-language-intelligence-market">Precedence Research</a>)</p></li><li><p>Hume AI appoints new CEO (<a href="https://www.prnewswire.com/news-releases/hume-ai-appoints-new-ceo-302668103.html">PR Newswire</a>)</p></li><li><p>ElevenLabs pledges to restore 1 million voices at SXSW (<a href="http://findarticles.com/elevenlabs-pledges-to-restore-1-million-voices-at-sxsw/">FindArticles</a>)</p></li><li><p>AI customer support startup Wonderful AI raises $150 million (<a href="https://www.bloomberg.com/news/articles/2026-03-12/ai-customer-support-startup-wonderful-ai-raises-150-million">Bloomberg</a>)</p></li><li><p>Devnagri AI launches multilingual enterprise speech AI (<a href="https://martechseries.com/predictive-ai/ai-platforms-machine-learning/devnagri-ai-launches-speech-ai-to-power-multilingual-voice-workflows-for-enterprises/">MarTech Series</a>)</p></li><li><p>Spectrum Business and RingCentral expand partnership (<a href="https://corporate.charter.com/newsroom/spectrum-business-and-ring-central-expand-partnership">Charter Corporate</a>)</p></li><li><p>CallMiner adds AI classifiers, custom summaries to CX platform (<a href="https://www.cmswire.com/contact-center/callminer-adds-ai-classifiers-custom-summaries-to-cx-platform/">CMSWire</a>)</p></li><li><p>Sakura adds speech synthesis API to AI platform (<a href="https://www.telecompaper.com/news/sakura-adds-speech-synthesis-api-to-ai-platform-launches-research-notebook-beta--1564747">Telecompaper</a>)</p></li><li><p>Agora removes barriers to scalable voice AI agents (<a href="https://www.globenewswire.com/news-release/2026/03/11/3253909/0/en/Agora-Removes-Barriers-to-Scalable-Voice-AI-Agents.html">Globe Newswire</a>)</p></li><li><p>ThinkrrAI advances its voice AI strategy (<a href="https://www.manilatimes.net/2026/03/08/tmt-newswire/globenewswire/thinkrrai-advances-its-voice-ai-strategy-under-cmo-cody-getchell-amid-growing-demand-for-ai-driven-automation/2295418">Manila Times</a>)</p></li><li><p>How voicemail-to-email transcription can create privacy exposure (<a href="https://www.paubox.com/blog/how-voicemail-to-email-transcription-can-create-privacy-exposure">Paubox</a>)</p></li><li><p>Outbound AI voice agents in Vodia v70 (<a href="https://telecomreseller.com/2026/03/11/outbound-ai-voice-agents-in-vodia-v70/">Telecom Reseller</a>)</p></li><li><p>Conversational AI solutions: Benefits, challenges &amp; best practices (<a href="https://www.nextiva.com/blog/conversational-ai-solutions.html">Nextiva</a>)</p></li><li><p>AI ring startup takes on OpenAI And Meta In Wearables (<a href="https://www.upstartsmedia.com/p/sandbar-stream-ai-ring-raises-23m">Upstarts Media</a>)</p></li><li><p>Together AI launches voice agent platform with sub-700ms latency (<a href="https://www.mexc.com/news/917035">MEXC</a>)</p></li><li><p>Sinch unveils Voice Relay to power AI-driven calls (<a href="https://telconews.com.au/story/sinch-unveils-voice-relay-to-power-ai-driven-calls">Telco News</a>)</p></li><li><p>Ex-Apple engineer&#8217;s voice-only pendant raises $5M (<a href="https://www.techbuzz.ai/articles/ex-apple-engineer-s-voice-only-pendant-raises-5m">TechBuzz AI</a>)<br></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p>Hume AI: First open source TTS model, TADA (<a href="https://x.com/hume_ai/status/2031401003078062578?s=20">X</a>)</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;10fd5511-c88d-4bbe-853c-c348f6a7e58b&quot;,&quot;duration&quot;:null}"></div></li><li><p>How developers can bring voice AI into telephony applications (<a href="https://www.infoworld.com/article/4136039/how-developers-can-bring-voice-ai-into-telephony-applications.html">InfoWorld</a>)</p></li><li><p>This AI can hear, translate, and speak back in 100 languages (<a href="https://hackernoon.com/this-ai-can-hear-translate-and-speak-back-in-100-languages?source=rss">Hacker Noon</a>)</p></li><li><p>KrishokBondhu: A retrieval-augmented voice-based agricultural advisory call center for Bengali farmers (<a href="https://arxiv.org/abs/2510.18355">arXiv</a>)</p></li><li><p>Causal prosody mediation for TTS: Counterfactual training of duration, pitch, and energy in FastSpeech2 (<a href="https://tldr.takara.ai/p/2603.11683">TLDR Takara</a>)</p></li><li><p>The future of clearer speech is multimodal (<a href="https://hackernoon.com/the-future-of-clearer-speech-is-multimodal">Hacker Noon</a>)</p></li><li><p>Fish Audio S2, a new generation of expressive TTS with controllable emotion (<a href="https://x.com/FishAudio/status/2031411140820152560?s=20">X</a>)</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;3502fd21-629a-41da-8a0a-925075508906&quot;,&quot;duration&quot;:null}"></div></li><li><p>JEPA-v0: Audio encoder for real-time speech translation (<a href="https://www.startpinch.com/research/en/jepa-encoder-translation/">StartPinch</a>)</p></li><li><p>Human brain and AI speech recognition decode speech similarly (<a href="https://techxplore.com/news/2026-03-human-brain-ai-speech-recognition.html">TechXplore</a>)</p></li><li><p>Cybersecurity and forensic audio analysis: Deepfake detection based on MFCC, audio-text disconsistency, and prosodic features (<a href="https://www.scirp.org/journal/paperinformation?paperid=150057">SCIRP</a>)</p></li><li><p>Voice isolation iPhone guide (<a href="https://thinkdesignblog.com/voice-isolation-iphone-guide/">Think Design Blog</a>)</p></li><li><p>Gemini embedding 2: Natively multimodal embedding model (<a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/">Google Blog</a>)</p></li><li><p>Building a TTS engine in pure C (<a href="https://dev.to/gabrielemastrapasqua/building-a-text-to-speech-engine-in-pure-c-59h4">Dev</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Crazy Week 🔥 Updates from Anthropic, OpenAI, Krisp, Assembly and much more! ]]></title><description><![CDATA[Voice AI weekly digest]]></description><link>https://voice-ai-newsletter.krisp.ai/p/crazy-week-updates-from-anthropic</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/crazy-week-updates-from-anthropic</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Mon, 09 Mar 2026 14:03:10 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1c900b4a-e7f2-4f24-b0f3-6c4aa1cee74a_1456x816.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Updates &#128170;</h2><ul><li><p>Assembly launches Universal Pro-3 streaming (<a href="https://www.assemblyai.com/blog/universal-3-pro-streaming">Assembly</a>)</p></li><li><p>Anthropic launches voice mode for Claude Code (<a href="https://mlq.ai/news/anthropic-launches-voice-mode-for-claude-code/">MLQ</a>)</p></li><li><p>OpenAI develops a &#8216;Bidirectional&#8217; Audio Model to boost Voice Assistants (<a href="https://www.theinformation.com/newsletters/ai-agenda/openai-develops-bidirectional-audio-model-boost-voice-assistants">The Information</a>)</p></li><li><p>Krisp launches listener-side, real-time Accent Conversion (<a href="https://siliconangle.com/2026/03/03/krisp-launches-listener-side-real-time-accent-conversion/">SiliconANGLE</a>)</p></li><li><p>AI Vocal Cloning and the Limits of Voice-Based Authentication (<a href="https://bisi.org.uk/reports/when-voice-is-no-longer-proof-ai-vocal-cloning-and-the-limits-of-voice-based-authentication">BISI</a>)</p></li><li><p>Huawei launched next-generation voice virtual agents (<a href="https://www.huawei.com/en/news/2026/3/mwc-voice-interaction-aicc">Huawei</a>)</p></li><li><p>Modulate adds nuance to voice analysis (<a href="https://www.nojitter.com/ai-automation/modulate-adds-nuance-to-voice-analysis">NoJitter</a>)</p></li><li><p>Deutsche Telekom partners with ElevenLabs to bring AI assistant to calls (<a href="https://www.wired.com/story/deutsche-telekom-elevenlabs-ai-phone-calls-mwc-2026/">Wired</a>)</p></li><li><p>Alibaba Tongyi unveils Fun-CosyVoice3.5 and Fun-AudioGen-VD with FreeStyle voice generation (<a href="https://pandaily.com/alibaba-tongyi-unveils-fun-cosy-voice3-5-and-fun-audio-gen-vd-with-free-style-voice-generation">Pandaily</a>)</p></li><li><p>Voice AI platform VoiceLine raises 10M EUR in series A (<a href="https://slator.com/voiceline-raises-10m/">Slator</a>)</p></li><li><p>LevelAI expands agentic CX platform (<a href="https://customerservicemanager.com/level-ai-expands-agentic-cx-platform-to-deliver-human-quality-virtual-agents/">Customer Service Manager</a>)</p></li><li><p>Talkdesk CX accelerates patient access with agentic AI (<a href="https://www.globenewswire.com/news-release/2026/03/05/3250425/0/en/Talkdesk-Customer-Experience-Automation-accelerates-patient-access-with-agentic-AI-orchestration.html">GlobeNewswire</a>)</p></li><li><p>Syntiant to showcase always-on AI voice solutions (<a href="https://www.globenewswire.com/news-release/2026/03/05/3250619/0/en/Syntiant-to-Showcase-Always-On-AI-Voice-Solutions-at-Embedded-World-2026-with-Seltech.html">GlobeNewswire</a>)</p></li><li><p>ElevenLabs &amp; Google dominate Artificial Analysis&#8217; STT benchmark (<a href="https://the-decoder.com/elevenlabs-and-google-dominate-artificial-analysis-updated-speech-to-text-benchmark/">The Decoder</a>)</p></li><li><p>DiligenceSquared uses AI to make M&amp;A research affordable (<a href="https://techcrunch.com/2026/03/05/diligencesquared-uses-ai-voice-agents-to-make-ma-research-affordable/">TechCrunch</a>)</p></li><li><p>3CLogic chosen to enhance ServiceNow-driven managed services (<a href="https://www.prnewswire.com/news-releases/3clogic-chosen-by-apex-systems-to-enhance-servicenow-driven-managed-services-302701229.html">PR Newswire</a>)</p></li><li><p>AI vocal cloning and the limits of voice-based authentication (<a href="https://bisi.org.uk/reports/when-voice-is-no-longer-proof-ai-vocal-cloning-and-the-limits-of-voice-based-authentication">BISI</a>)</p></li><li><p>How large-scale speech models will impact voice AI (<a href="https://www.forbes.com/councils/forbestechcouncil/2026/02/26/how-large-scale-speech-models-will-impact-voice-ai/">Forbes</a>)</p></li><li><p>Why advanced voice agents require owning the voice stack (<a href="https://www.callcentrehelper.com/performance-voice-agents-voice-stack-271986.htm">Call Centre Helper</a>)</p></li><li><p>iFLYTEK Globally Launches AI Glasses and AI Interpret Mic (<a href="https://www.globenewswire.com/news-release/2026/03/05/3250195/0/en/iFLYTEK-Globally-Launches-AI-Glasses-and-AI-Interpret-Mic-Showcasing-Full-Scenario-AI-Translation-Solutions-at-MWC26.html">GlobeNewswire</a>)</p></li><li><p>Meeami Technologies, Alif Semiconductor to demonstrate ultra-efficient edge AI noise suppression (<a href="https://www.blufftontoday.com/press-release/story/57127/meeami-technologies-alif-semiconductor-to-demonstrate-ultra-efficient-edge-ai-noise-suppression-at-embedded-world-2026/">Bluffton Today</a>)</p></li><li><p>Sensory brings always-on AI speech and biometrics to Snapdragon Wear Elite (<a href="https://www.democratandchronicle.com/press-release/story/159304/sensory-brings-always-on-ai-speech-and-biometrics-to-snapdragon-wear-elite/">Democrat and Chronicle</a>)</p></li></ul><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://voice-ai-newsletter.krisp.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Engineering Corner &#128526;</h2><ul><li><p>Spectre I, the first smart device to stop unwanted audio recordings (<a href="https://x.com/aidaxbaradari/status/2028864606568067491?s=20">X</a>)</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;ad3c2fdf-a0e1-41a5-afd7-21720152a267&quot;,&quot;duration&quot;:null}"></div></li><li><p>Google releases <a href="https://x.com/GoogleResearch/status/2030012702668865784?s=20">WAXAL</a>. This open-access dataset delivers 2,400+ hours of high-quality speech data for 27 Sub-Saharan African languages, serving 100M+ speakers</p></li><li><p>Introducing KokoClone: Kokoro TTS, but it clones voices now (<a href="https://www.reddit.com/r/StableDiffusion/comments/1rjsgtd/kokoro_tts_but_it_clones_voices_now_introducing/">Reddit</a>)</p></li><li><p>VietSuperSpeech: A large-scale Vietnamese conversational speech dataset (<a href="https://arxiv.org/abs/2603.01894">arXiv</a>)</p></li><li><p>ZeSTA: Zero-shot TTS augmentation with domain-conditioned training for data-efficient personalized speech synthesis (<a href="https://tldr.takara.ai/p/2603.04219">Takara TLDR</a>)</p></li><li><p>How to compare latency and accuracy in voice recognition (<a href="https://www.goodcall.com/voice-ai/how-to-compare-latency-and-accuracy-in-voice-recognition">Goodcall</a>)</p></li><li><p>FineVoice review: Voice cloning in 30 seconds (<a href="https://www.unite.ai/finevoice-review/">Unite.AI</a>)</p></li><li><p>Improving automatic speech recognition for kids (<a href="https://drivendata.co/blog/child-asr-word-benchmark">DrivenData</a>)</p></li><li><p>Comparing STT algorithms for transcribing survey voice data (<a href="https://academic.oup.com/poq/article-abstract/89/4/1154/8418151?login=false">Oxford Academic</a>)</p></li><li><p>Top 10 voice AI agent platforms: Features, pros, cons &amp; comparison (<a href="https://www.bestdevops.com/top-10-voice-ai-agent-platforms-features-pros-cons-comparison/">Best DevOps</a>)</p></li><li><p>Best voice AI for fraud detection workflows (<a href="https://www.goodcall.com/voice-ai/best-voice-ai-for-fraud-detection-workflows">Goodcall</a>)</p></li></ul><div><hr></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://resources.krisp.ai/fullband-2025&quot;,&quot;text&quot;:&quot;Register now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://resources.krisp.ai/fullband-2025"><span>Register now</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://voice-ai-newsletter.krisp.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get the most important news in Voice AI delivered directly to your inbox every week</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Krisp Introduced Listener-Side Accent Conversion]]></title><description><![CDATA[A real-time Voice AI layer that improves understanding across meetings, contact centers, and AI agents]]></description><link>https://voice-ai-newsletter.krisp.ai/p/krisp-introduced-listener-side-accent</link><guid isPermaLink="false">https://voice-ai-newsletter.krisp.ai/p/krisp-introduced-listener-side-accent</guid><dc:creator><![CDATA[Davit Baghdasaryan]]></dc:creator><pubDate>Tue, 03 Mar 2026 16:19:42 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/189778557/574b54a15f1796935f66600ebefe240c.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Krisp just launched a revolutionary new technology: Listener-side Accent Conversion. </h3><p>Krisp now supports bidirectional Accent Conversion, clarity on both sides of live conversations &#8212; an industry first.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://krisp.ai/ai-accent-conversion/listener/&quot;,&quot;text&quot;:&quot;See what we built&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://krisp.ai/ai-accent-conversion/listener/"><span>See what we built</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.linkedin.com/feed/update/urn:li:activity:7434599180188180480/&quot;,&quot;text&quot;:&quot;Read Arto&#8217;s LinkedIn post&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.linkedin.com/feed/update/urn:li:activity:7434599180188180480/"><span>Read Arto&#8217;s LinkedIn post</span></a></p><p></p>]]></content:encoded></item></channel></rss>