6x faster Whisper 🔥 1M Zoom summaries! 🚀 New TTS from OpenAI 👀

Nov 08, 2023

Voice AI is destined to revolutionize how we communicate at work. The fast-paced innovation in Speech Recognition, Speech Synthesis, Speech Quality and LLMs are accelerating this disruption. The industry is already moving fast and this pace will only accelerate in the coming 5 years.

Last week was tremendous in this sense.

Top Updates 💪

Open AI launched 3 Voice AI features
- Whisper large-v3 featuring improved performance across languages. Improvements are shown here.
- A new TTS API with 6 preset voices. Seems like its pricing is 10x+ lower compared to the market 😯
- GPT-4 Turbo with 128K context for higher-quality and cheaper Meeting summaries
There is a new English-only Whisper model claimed to be 6x faster 🚄 This is great news for on-device transcription!
Zoom reached 1M meeting summaries generated with its AI Companion
ElevenLabs launched Eleven Turbo v2. Turbo is their fastest model so far with audio generation times of ~400ms
Prevail integrated Krisp’s noise-cancellation AI 💪
Podcastle launched noise cancellation for podcasts called Magic Dust 🪄
Descript launched AI-generated podcast notes

Noteworthy 📝

Text-To-Speech market is expected to grow to $17B by 2029 🚀
How to leverage Sentiment Analysis and voice data to obtain CX insights

Best Microphones 🎧 for Zoom, according to the CNET staff who use them
Yum is testing a voice-enabled AI drive-thru system in restaurants to increase productivity and also provide automated upsell recommendations
6 Customer Service trends 📈 for 2024 - AI chatbots, Omni-channel, Voice-based AI, Automation, AR and Personalization.
Azure AI Services introduced 7 new Text-to-Speech voices
Audio Hijack v4.3 has a new superpower: speech-to-text, powered by Whisper
Xenova announces a new Text-to-Speech-Client Tool: A Robust and Flexible AI Platform for Producing Natural-Sounding Synthetic Speech

Demos 😎

Quick demo of the 6x faster Distil Whisper model
Talk with an LLaMA AI in your terminal. Whisper Medium + LLaMA v2 13B on M2 Ultra.
Experimenting with the magic of open-source! Whisper for text translation, XTTS for audio, and Video-retalker for seamless mouth sync in a short video.

Voice AI Newsletter

Discussion about this post

Ready for more?