Google presents on-device, real-time Voice translation 👀, Updates from DeepMind, Krisp, Zoom, NICE and others🔥

Davit Baghdasaryan

Jun 24, 2024

Google presents the first on-device, real-time speech-to-speech translation model

Top Updates 💪

DeepMind’s new AI generates soundtracks and dialogue for videos
RingSense receives “Significant” sales performance enhancements

Meta has created a way to watermark AI-generated speech
Krisp launches AI Accent Localization SDK Early Access
Parsing Mpower: NICE’s integrated “CX AI” offering
HeyGen raises $60M in series A funding
Meta releases five new AI models for audio and visual research
Zoom adds new agent-assist, translation, & SMS capabilities to its Contact Center

Spring Labs introduces AI copilot for fintechs
IZEA introduces AI voice cloning and speech synthesis in FormAI
NXP introduces audio DSPs with AI audio functions for infotainment

Tandem Health raises $9.5m to scale its healthcare co-pilot
GreyLabs AI raises seed funding for speech analytics in banking and fintech

Hark, provider of a Voice of Customer (VoC) platform, raises $3.5M in seed funding

Noteworthy 💪

WhatsApp works on a Voice Note to Transcribe feature: Here is how it may work
Listen to this page: Chrome's new text-to-speech feature
Voiceitt Chrome extension empowers people with speech disabilities
The power of AI transcription for streamlined communications
Vocal robots: Who am I speaking with?
How to build a startup in real-time AI speech translation
New research alert: How AI is changing employee and customer experiences

Science and Demo Corner 😎

Toucan TTS: MIT licensed text-to-speech in 7000 languages
Host the Whisper model with streaming mode on Amazon EKS and Ray Serve
Domain adaptive dual-relaxation regression for speech emotion recognition
AI-powered virtual assistants for businesses
Simplify transcription with Oracle Cloud Infrastructure generative AI and speech

Designing the API for building voice assistants, with Nikhil Gupta from Vapi
Seismic for meetings: What sales meetings have been missing
Waveform-domain speech enhancement using spectrogram encoding for ASR
DFNet: Decoupled fusion network for dialectal speech recognition
Adversarial meta sampling for multilingual low-resource speech recognition
Conformer-based speech recognition on extreme edge-computing devices

Discussion about this post

No posts

Ready for more?

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts