In the Future of Voice AI series of interviews, I ask three questions to my guests:
- What problems do you currently see in Enterprise Voice AI?
- How does your company solve these problems?
- What solutions do you envision in the next 5 years?
This episode’s guest is Will Bodewes, CEO at Phonely.ai.
Will Bodewes is the Co-founder and CEO of Phonely.ai, a Y Combinator–backed startup building conversational phone support powered by AI. A lifelong competitor and creator, he earned a mechanical engineering degree from UNH and launched his first company, Spoke Sound, soon after. Following AI research and travels across Africa, Asia, and the Pacific, Will combined his technical background and curiosity to take on one of tech’s toughest challenges: making AI sound human.
Phonely provides AI-powered phone support agents for industries requiring fast, reliable, and human-like AI interactions. Its AI solutions reduce wait times, improve customer experiences, and enable seamless automated conversations.
Recap Video
Takeaways
Voice AI jumped from niche to movement in two years, with young builders driving it.
Reliability at scale beats clever prompts; buyers want systems that just work.
Time-to-value is the moat; months of coding kills deals.
Every AI agent succeeds only if it knows what to say, what to know, and what to do: conversation, context, and action.
Integrations are the choke point; the hard work is plumbing messy CRMs and legacy tools.
Training BPO teams to build on the platform scales better than flying in engineers.
LLMs are the latency bottleneck, so faster tokens = more human conversations.
Groq partnership delivered lower latency and beat big names on some Phonely benchmarks.
“Did the caller detect it wasn’t human?” is a better quality metric than WER.
Phonely claims 100% function-calling accuracy in production, which is what buyers actually feel.
Low ASR confidence should trigger human-like behavior (ask to spell names), not clunky links.
Capturing names, numbers, and addresses is the last-mile blocker; fix this or nothing else matters.
Cascading still wins for business logic; speech-to-speech isn’t reliably deployed in production.
Best near-term wins: customer support with tight FAQs, lead qual, and appointment setting.
Defined outcomes plus A/B testing lets agents match call-center KPIs at 50–70% lower cost.
Enterprise rollout will be gradual (2–3 years) until hallucination fear fades.
The next unlock is LLMs that talk like people while staying fast and precise.
Expect convergence where “voice-to-voice” and cascading blur, but LLMs keep the reasoning core.