Huge updates from OpenAI, Meta, Google, NVIDIA, Google and others 🔥

Davit Baghdasaryan

Sep 30, 2024

OpenAI rolls out Advanced Voice Mode with more voices and a new look

Top Updates 💪

Meta releases Llama 3.2 — and gives its AI a voice

NVIDIA launches translation AI and multilingual speech microservices

NotebookLM enhances AI note-taking with YouTube and audio file sources
Meta launches AI Dubbing and Speech Translation
Unlimited possibilities for service providers in conversational AI
These celebrities loaned their voices to Meta AI's new speech feature
Jingzhunxue debuts open-source speech LLM FlowMirror
TwilioGPT modernizing telephone and voice systems using LLMs and NLP
Doctorpresso debuts depression-detecting voice journaling app
AI detects hypertension in voice recordings
Yubi and AI4Bharat to build India’s first ASR engine for financial inclusion

Google suite leverages conversational AI for customer support
MindsDB launches conversational enterprise-ready AI that shows how it thinks

Best of show winner Illuma Labs raises $9 Million in Series A funding
Prepared, which wants to ‘revolutionize’ emergency 911 calls, raises $27M
AI-powered customer support startup Ujet raises $76M
Nurix AI raises $27.5M to scale development of custom enterprise AI agents
Max is getting Google AI-generated closed captions

Noteworthy 💪

AI-powered tech could help people with speech impairments to work remotely
A user tried Google's AI podcast creator and is now unsure what's real anymore
Improving voice recognition for people with speech disabilities
STT learns to understand people with Parkinson's disease—by listening to them
Bank warns of voice-based AI scams that could utilize your social media posts
Voice artists sue tech company for 'stealing their voices'

How AI impact voice recognition systems? Trends and innovations
Multimodal LLMs in health care: Applications, challenges, and future outlook
How your brain tells speech and music apart
Can ChatGPT do reliable call center sentiment analysis?
The 4th revolution in customer experience & AI
Plaud takes a crack at a simpler AI pin

Science and Demo Corner 😎

Meet TEN, the world's first truly real-time multimodal agent framework
PDF2Audio: Convert PDFs to podcasts, lectures & more audio
Let’s talk about some cool Azure AI Speech SDK/API Endpoint
Using a speech language model that can listen while speaking

Novel MultiTalker speech recognition with speaker tokens
Contrastive speaker representation learning for speaker recognition
Exploring the world of open-source text-to-speech models
A novel AI approach that combines audio coding and source separation
Voice and chat agents using Amazon Connect, Amazon Lex, and Amazon Bedrock
Retirement: Conversation Transcription Multi Channel Diarization

Discussion about this post

No posts

Ready for more?

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts