Last week we hosted Fullband 2024, our third annual voice innovation conference.
During my keynote, I shared insights from the last 10 months of interviews with CX and tech leaders, and dozens of articles from the Voice AI Newsletter.
And Steve Morrell, Managing Director and Chief Analyst at ContactBabel, joined us to discuss the industry’s biggest trends, challenges, and opportunities.
We dove deep into the voice AI technology powering Krisp’s offerings and more.
The event was jam-packed with insights, innovative demos which you can view below, real-world applications, and implications.
Attendees had so many great questions that we couldn’t answer them all live, so we’ve included them below.
Fullband Recap
Takeaways
Voice Channel Insights:
65% of customer interactions still happen through voice.
The big problem from the business side is that voice is by far the most expensive channel.
Regardless of other channels, voice has everything together. This is where customers are.
80% of customers prefer human support via voice.
Accents and audio quality remain key challenges, costing contact centers $5.5 billion annually.
Voice bots now handle up to 40% of simple tasks, freeing up agents for complex interactions.
Steve doubts voice bots will take over soon, given the industry's size and consumer preferences.
The biggest issue for customers is long queues, which heavily impact satisfaction and overall experience—and it's only getting worse.
Voice bots will be used much more powerfully for triage, determining if an issue is high emotion, urgency, or complexity, and routing accordingly.
Customer Spotlight: Arrivia increased revenue by 17%, boosted customer NPS by 57%, and improved agent performance KPIs by 13% using Krisp’s AI Accent Localization.
Krisp announced product roadmap updates, including real-time transcripts, new languages supported, and enhanced voice AI capabilities.
Krisp Innovations
AI Noise Cancellation and Accent Localization
Krisp's AI-powered Noise Cancellation bidirectionally eliminates noise, echo, and background voices to enhance voice fidelity for clear communication. AI Noise Cancellation is the foundation that all other features and offerings are built on, ensuring the best possible voice quality, accuracy, and clarity.
AI Accent Localization utilizes real-time inflection changes to help customers understand agents better by dynamically changing agents' accents into the customer's natively understood accent.
AI Live Interpreter
Krisp AI Interpreter is an AI-powered tool designed for call centers to provide real-time language translation in over 20 languages. It requires no integration to work with all CX and voice platforms.
AI Agent Copilot
Krisp AI Agent Copilot is designed to assist call center agents by transcribing calls in real-time and generating post-call AI summaries so agents can focus more on the customer than note-taking.
Fullband 2024 Q&A
1. Based on the demo, the AI’s neutralized accent sounded quite natural. However, is there anything that could potentially make it sound unnatural? If so, what factors could contribute to that?
The truth is, this technology is complex and took over two years to develop. Perceptions of naturalness and comprehension are subjective and can vary between listeners.
Key factors that affect the naturalness of accent localization include:
The speed and pitch of speech and the thickness of the accent.
Naturalness and comprehension are subjective, varying significantly between listeners.
Many edge cases, like noise, need to be managed to avoid output issues.
Achieving perfection in real-time voice processing takes time, but we are optimistic about its progress and future performance.
2. How can we improve AI-based accent localization to ensure it captures and preserves the agent's tonal characteristics?
As presented during the roadmap showcase, we plan to introduce the Accent Reduction mode to the product. It aims to capture and preserve the agent’s tonal and emotional characteristics in speech.
This model will be in production by Q4, offering enhanced accent localization capabilities.
3. How do agents feel about Accent Localization technology?
Agents love this technology.
More accessible communication: It helps agents be understood more clearly, reducing miscommunication and frustration.
Less bias: It minimizes the impact of customer bias based on accents, creating a more equal playing field.
Improved customer experience: Agents can focus on solving problems rather than repeating themselves or trying to forcefully neutralize their accents.
Less stress: Clear communication reduces stress for agents and reduces their cognitive load, leading to a better work environment.
Higher job satisfaction: Agents feel more appreciated and valued with fewer negative interactions. Our customers see an 8% increase in agent satisfaction for those using Krisp.
4. How does Krisp ensure the privacy of user data and audio recordings? What data does Krisp collect, and how is it used?
Krisp processes all audio data on your device's CPU. This means that no voice data ever leaves the device or is subject to cloud vulnerabilities. This unique architecture ensures maximum security and privacy for all Krisp users and their customers.
Krisp is SOC-2 Type II and GDPR compliant, and many customers use it within their HIPPA and PCI-compliant services.
The sole case where the audio leaves the device is when using AI Live Interpreter. However, it’s processed with third-party services that ensure high-level security.
Also, the AI Agent Copilot summarizes the PII-redacted transcript on the cloud, but we plan to do the on-device call summarization in the upcoming quarters.
Krisp never stores any audio or text data about the calls on-device or in the cloud.
You can get more info on the topic on our Security page: https://krisp.ai/security/
5. You mentioned that AI translation is bidirectional. Is bidirectionality also planned for accent localization to assist agents?
While we monitor this opportunity closely, our short-term roadmap for Accent Localization is focused on enhancing outgoing voice comprehension first and then bringing this technology to a wider variety of speakers.
In the meantime, real-time transcripts in Agent Copilot may provide the support and tools to elevate an agent’s understanding of customers’ speech.
6. What makes Krisp’s noise cancellation superior to high-quality headsets or other tech-based noise cancellation alternatives?
Noise-canceling headsets have limited processing power and can only manage stationary background noises.
In contrast, Krisp's AI model can handle both stationary and non-stationary noises, providing more comprehensive noise cancellation. And, unlike headsets, Krisp can remove background noise not only from the agent's outgoing audio but also from the customer's inbound audio stream, offering a unique two-way noise cancellation feature.
Overall, Krisp offers the following that headsets and other technology providers can’t:
Krisp is tested on over 2T mins of conversations across 200M+ devices
Superior Noise Cancellation for both stationary and non-stationary noises
Tw-way Noise Cancellation: removes agent’s side and also customer-side noises
Low CPU usage: Optimized for a range of CPUs, allowing smooth performance on lower-end devices.
Built for enterprise: Krisp is built for easy deployment and scale, with tools for centralized management and analytics.
No special hardware needed: Unlike high-end headsets, Krisp works with your existing setup (microphone, speakers, headphones).
Tested across devices and apps: Compatible out-of-the-box with all the call center headsets and over 800 apps on Windows and macOS
7. One of Krisp's key differentiators is its on-device processing architecture. Can you discuss the device-based processing power required to manage these new technologies, especially in contact center environments with older-generation CPUs?
Many call centers use older machines, and Krisp has invested years and significant engineering effort to optimize its algorithms for these devices.
The CPU and memory required for optimal functioning largely differ across the different Krisp products and their combinations. In addition, the load on the machine is highly important. Generally, the software supports 7th or 8th-generation Intel i5 CPUs, which are sufficient for most Krisp functionalities.
Customers with outdated hardware often accelerate their PC refresh cycles to deploy these technologies, as the business value and fast ROI justify the investment in newer hardware.
8. This is a 3-part question related to Krisp's AI Live Interpreter
How much latency does the translation engine add, considering the inherent latency from telephony in platforms like Zoom or Teams?
AI Live Interpreter sends audio to the cloud for translation processing, unlike Noise Cancellation or Accent Localization, which are processed locally.
The latency mimics the experience of a human interpreter on a call, where some delay is necessary for the translation to be contextually accurate. This delay is expected and acceptable, so reducing latency is not the primary focus for us.Is any voice data or transcription stored during processing?
No voice or audio data is stored; everything is processed in transit in the cloud, and the processed audio is immediately played back.How does Krisp ensure privacy and compliance with regulations like GDPR and HIPAA?
The cloud used for translation is GDPR, PCI, and HIPAA-compliant. Because we do not store any voice data or transcription, this ensures the highest level of security and privacy.
9. Can the transcription also be translated from X language to English without having the live interpreter feature?
AI Live Interpreter is designed to enable seamless communication between two call participants, bidirectionally speaking different languages. While this functionality is unavailable today, our product team would be excited to explore this feedback and evaluate if Krisp is positioned to serve this need. You can book a demo here.
10. How accurate is your AI Live Interpreter? If this is highly accurate, we do not need to rely on human error with medical terminology, etc.
The accuracy and consistency of live translation may vary between different language pairs. While the translation may be acceptable for some widely spoken languages, it might require additional production testing and validation for less commonly spoken languages. The agent can monitor transcripts and translations to confirm terminology is captured correctly.
11. Does the multilingual translation require English at either endpoint to work? Or can it be different language combinations? E.g., can a Spanish speaker service a German speaking customer?
AI Live Interpreter supports different pairs of bidirectional interpretation.
Share this post