Updates from Amazon, MS, Stability, Podcastle, Sesame and more! 🔥

Mar 10, 2025

Top Updates 💪

How Amazon combines models, agents, and browser for smarter AI (VentureBeat)
Microsoft Dragon Copilot launches a unified voice AI for healthcare (Microsoft)
A year later, OpenAI still hasn’t released its voice cloning tool (TechCrunch)
Stability AI optimizes its audio model to run on ARM chips (TechCrunch)
Podcastle launches a TTS model with more than 450 AI voices (TechCrunch)
Talking with Sesame’s AI voice companion is amazing and creepy (ZDNet)
How AI is changing the way we communicate – the future of interaction (Forbes)
Deutsche Telekom and ElevenLabs announce strategic partnership to power AI-driven podcasting in Magenta App (ElevenLabs)
This AI Meeting Assistant is a must-have for remote work (MakeUseOf)
AI-powered chatbots revolutionizing customer experience in 2025 (SaveDelete)
Breaking language barriers with Intercom translations (AutoGPT)
Agora unveils AI engine for smooth voice interactions (PR Newswire)
Deepgram Nova-3 medical AI speech model reduces healthcare transcription errors (Artificial Intelligence News)
Fish Audio debuts Fish Speech 1.5 for realistic voice cloning (WICZ)
Whispp lets people with speech impairments speak in real-time using AI (Heise)
Audioshake unveils AI to separate overlapping voices (Podcasting Today)
Uniti AI raises $4M in seed funding (Finsmes)
HeyMilo raises $2.2M in seed funding (Finsmes)
AI at the microphone: the voice of the future? (HIIG Digital Society Blog)
Tightrope launches MediaScribe caption and translation service (TVNewsCheck)

Spark-TTS: an efficient LLM-based text-to-speech model with single-stream decoupled speech tokens (ArXiv)
SpeechCompass: enhancing mobile captioning with diarization and directional guidance via multi-microphone localization (ArXiv)
Nexus-O: An omni-perceptive and interactive model for language, audio, and vision (ArXiv)
Enhancing dictation with conformer-based speech recognition (Medium)
Large language model-based generative error correction (NVIDIA Research)
How to create a voice cloning AI model for realistic speech synthesis (Dev)
Meetily: The open-source, self-hosted alternative to FirefliesAI (Dev)
Building a meeting summarizer backend with Python, FastAPI, AWS Transcribe, and Bedrock (Dev)
Top AI STT tools for researchers and educators (EducatorsTechnology)