At Krisp, we stand at the forefront of Voice AI revolution.
We engage with dozens of companies every week and continuously collaborate with the industry leaders in Voice AI. During these interactions, we receive invaluable feedback and insights into the market and the technology.
Based on these insights I present our 8 predictions for Voice AI in 2024.
Summary:
AI voice bots cover 1% of all call center calls
a major app launches live speech translation with 5sec latency
100K+ call center agents use accent conversion in live calls
20M+ people use AI note-takers in meetings
cloud STT gets 2x cheaper
major voice apps go from cloud STT to on-device
100K+ people will use live call assist in their meetings
Voice AI startups funding will surpass $3B/year
1) Voice bots will cover 1% of all customer service calls
No doubt AI voice bots are going to disrupt the customer service industry.
They are powered by two key AI technologies Text-to-speech (TTS) and LLM.
Today, TTS technologies are already so advanced that people are having a hard time distinguishing real human voices from AI-generated ones. TTS latency and quality in calls will continue improving and will become market-ready in 2024.
LLM technology powers the intelligence of the bot. GTP-4-powered bots are already doing an amazing job within reasonable costs.
As companies invest in fine-tuning LLMs and removing existing problems such as “hallucinations”, LLMs will become ready for their prime time in handling low-hanging tasks in customer service.
Bots are easily trained, they don’t get tired, they have unlimited memory, they are cost-effective and they scale easily. 2024 will be a big year for Voice bots.
2) A major app will launch Speech-to-speech Translation with 5sec latency
Given the AI Translation progress in Meta AI research, we predict that a 5-seconds latency speech-to-speech translation feature will be launched in a major voice communication app. It might be one of Meta’s products but might as well be one of their partners.
5 seconds is still too much for a natural conversation but this will be a major breakthrough for overcoming the language barrier.
3) 100,000+ call center agents will use Accent Conversion technology in live calls
Krisp is on the frontline of AI-powered Accent Conversion technology (aka Accent Localization). This technology is advancing so rapidly that we have no doubt it will be market-ready in 2024.
Given the demand and excitement we see in the call center market, no doubt the deployment of this magical technology will scale fast, benefiting agents, managers and customers.
4) 20M+ people will use AI note-takers during meetings
Many tools already offer meeting transcription, summary, and follow-up generation.
The quality of the technology varies from 60%-80% for now. No doubt it will keep improving and the manual work will be fully automated in the coming year or two.
There are already multiple companies doing this:
We at Krisp, as well as Zoom, both recently announced reaching 1M+ automatic AI meeting summaries. These numbers undoubtedly will grow significantly in 2024.
5) Cloud Speech-to-text will get 2x cheaper
The launch of Whisper disrupted the Speech-to-text market. The market keeps growing but strangely enough, the price is not descrease dramatically. STT prices are ridiculously high today. The high price is due to the cost related to audio traffic in the cloud as well as cloud GPU costs.
Three things will impact driving costs down:
Inference optimizations in Whisper and other STT technologies
More efficient deployment of GPUs
Significant increase in demand for STT in many existing and new use cases
6) Major voice apps will switch Speech-to-text from cloud to on-device
Running STT workloads on-device rather than in the cloud has several reasons:
Cost savings (cloud STT is very expensive)
Better and consistent latency for use cases such as Live Caption and Live Assist
We predict that several major voice communication apps will switch to on-device STT.
7) 100K+ people will use Live Call Assist in their daily meetings
Live Call Assist technologies help users be more effective in their conversations by giving them real-time hints on what to say and how to answer specific questions. Such technology is already being used in call centers.
With the fast advancements in on-device STT and LLM technologies, it’s apparent that this technology will become part of our daily routine beyond call centers.
We predict that the "second brain” sitting next to you and helping/coaching you during your meetings will already be a reality in 2024.
8) Voice AI startups funding will surpass $3B/year
Voice AI technology is growing very fast and the startups helping build the future have a chance to capitalize on the progress mentioned in the last 7 points.
There are multiple Series A-F companies in the space waiting for explosive growth.
VCs are excited about this market too and we predict that they will significantly increase the funding in the space.