Another amazing week in Voice AI

Voice AI weekly digest

Davit Baghdasaryan

Aug 18, 2025

Top Updates 💪

Salesforce & ServiceNow invest $1.5bn in Genesys (CX Today)
Apple’s new Siri may allow users to operate apps just using voice (TechCrunch)
NVIDIA releases open dataset, models for multilingual speech AI (NVIDIA Blog)
Voice AI is changing how we do business, starting with loans (Forbes)
Cerence Audio AI and HARMAN AudioworX boost in-car experiences (Nasdaq)
Pocket FM gives writers an AI tool to transform narratives & more (TechCrunch)
Aqua Voice shows just how good Mac dictation could be (9to5Mac)
Deepgram receives 2025 contact center technology award (Morningstar)
Krisp launches Audio-only, small turn-taking model for Voice AI Agents (Krisp)
NiCE and Salesforce deepen end‑to‑end customer service partnership (No Jitter)
The Opus Research conversational AI / self‑service Intelliview 2025 (CX Today)
Singular Hearing launches HeardThat Plus to fill transcript gap (PR Newswire)
Palabra gets backing from Reddit co-founder’s venture firm (TechCrunch)
AI interpreting startup Kotoba raises $11.8M in in seed (Slator)
Maven Voice brings human-like AI to support calls (PR Newswire)
Capacity acquires Call Criteria and Verbio Technologies (Smart Customer Service)
LG adds Cerence AI to smart TVs (TV Technology)
Nvidia aims to solve AI issues with many languages (Artificial Intelligence News)
Voice AI is becoming the streaming industry’s secret weapon (Streaming Media)
Google partners with projects to bridge Asian language barriers (Newsbytes.PH)

Our Latest Article

Articles

Krisp Accent Conversion v3.7, Major Leap in Naturalness and Stability

Davit Baghdasaryan

August 14, 2025

Krisp Accent Conversion v3.7, Major Leap in Naturalness and Stability

Krisp’s Accent Conversion technology has been on a rapid path of continued innovation since v3 launched in March 2025, when it became mature enough for wide-scale deployment.

Read full story

Engineering Corner 😎

AI scans audio recordings to detect voice box cancer (The Scientist)
Whisfusion: Parallel ASR decoding via a diffusion transformer (arXiv)
SpeakerLM: End-to-end versatile speaker diarization and recognition with multimodal large language models (arXiv)
NTT’s 18 papers accepted for Interspeech2025, the world’s largest international conference on spoken language processing (WebWire)
FFmpeg 8.0 merges OpenAI Whisper filter for automatic speech recognition (Phoronix)
Building IVR systems with Vodia’s JavaScript IVR (Telecom Reseller)
A multimodal affective interaction architecture integrating BERT‑based semantic understanding and VITS‑based emotional speech synthesis (MDPI)
Pitch accent detection improves pretrained ASR (arXiv)
Real‑time accessibility platform with Redis‑powered intelligence (DEV)
Snowflake AI_TRANSCRIBE: Transform audio to insights with SQL (DEV)
AVE Speech: A comprehensive multimodal dataset for speech recognition integrating audio, visual, and electromyographic signals (IEEE Xplore)
Building a production-ready STT system with fine-tuned Whisper (DEV)
Voice Embed transforms text to embeddable voice players (Trend Hunter)
Dubbing movies via hierarchical phoneme modeling and acoustic diffusion denoising (ResearchGate)

Voice AI Newsletter

Krisp Accent Conversion v3.7, Major Leap in Naturalness and Stability

Discussion about this post

Ready for more?