In the Future of Voice AI series of interviews, I ask three questions to my guests:
- What problems do you currently see in Enterprise Voice AI?
- How does your company solve these problems?
- What solutions do you envision in the next 5 years?
This episode’s guest is Alan Cowen, CEO and Chief Scientist at Hume AI.
Alan Cowen is founder, CEO, and Chief Scientist of Hume AI, a research lab and technology company that launched EVI, the first emotionally intelligent voice-language AI model and API. Before Hume, he started the Affective Computing team at Google, was a researcher at Berkeley's Social Interaction Lab, and served as a scientific advisor to Facebook. As an emotion and data scientist with 40+ peer-reviewed publications, Alan has spent his career researching human emotional experience and expression. His work has given rise to semantic space theory - a breakthrough computational approach to understanding how nuances of voice, face, body, and gesture are central to human connection and emotional experience. This theory forms the foundation of Hume’s groundbreaking models, including OCTAVE, EVI, and their Expression Measurement API. Hume was founded in 2021 to optimize AI systems for human well-being and developed the first emotionally intelligent speech-language model that is now driving a new frontier of voice AI.
Hume AI is dedicated to building AI that is directly optimized for human well-being. They are working on the next generation of a foundational audio-language model that drives an empathic AI assistant for any application. Their model understands subtle tones of voice, word emphasis, facial expression, and more, along with the reactions of listeners. Hume calls learning from these signals “reinforcement learning from human expression” (RLHE). AI models trained with RLHE can serve as better question answers, copywriters, tutors, call center agents, and more, even in text-only interfaces. Their goal is to enable a future in which technology draws on an understanding of human emotional expression to better serve human goals.
Recap Video
Takeaways
Hume AI develops technology that analyzes voices, faces, and expressions to understand human emotions.
Building emotionally smart AI needs deep learning combined with human behavior science to truly understand and respond to feelings.
Hume’s AI models aim to improve human well-being, not just detect emotions, showing the importance of ethics.
Real-time emotional AI needs to react super fast, which makes it hard to give quick and accurate responses.
AI that understands nonverbal cues can change how we communicate, especially in real-time situations.
Recognizing emotions is more than spotting patterns; AI needs to know the context to get it right.
Old methods of emotion detection often fail because they don't capture the full range of human reactions.
Even though AI has come a long way, a lot of customer service still doesn’t use language models, missing chances to make communication better.
Good emotional AI shouldn’t just read feelings but also predict changes in emotions as they happen.
To be effective, emotional AI has to keep learning to adjust to new situations and different people.
Context is key since the same emotion can mean different things depending on what’s happening.
There should be clear rules to make sure emotional AI focuses on helping people rather than just labeling feelings.
AI that reads emotions should be open about how it works so people can trust it, especially when it’s used in sensitive areas.
Reading nonverbal cues could make customer service and healthcare better, but the AI has to work with the nuances of human expression.
Share this post