0:00
/
0:00

Beyond Cascades to Speech-to-Speech | Anshul Shrivastava & Kumar Saurav (Co-Founders at Vodex.ai)

In the Future of Voice AI series of interviews, I ask three questions to my guests:

- What problems do you currently see in Enterprise Voice AI?
- How does your company solve these problems?
- What solutions do you envision in the next 5 years?

This episode’s guests are Anshul Shrivastava, Co-Founder and CEO, and Kumar Saurav, Co-Founder and CTO, at Vodex.ai.

Vodex specializes in Generative AI-powered voice agents that facilitate natural, humanlike conversations with customers. These virtual agents manage the initial phases of customer interactions, offering businesses a scalable and efficient way to handle inbound and outbound sales and collections calls. By personalizing conversations and providing real-time insights, Vodex helps businesses improve engagement and streamline processes.

Anshul Shrivastava is the Founder and CEO of Vodex.ai, with 12+ years in the IT industry and a strong focus on AI innovation. He leads Vodex.ai in building global AI solutions, aiming to drive growth and deliver real impact for clients. Anshul views technology as a catalyst for progress and is passionate about shaping the future of AI.

Kumar Saurav is the Co-Founder and CTO of Vodex.ai, where he drives the development of generative AI solutions for business. With 13+ years across IT, IoT, Robotics, and AI, he brings both technical depth and business insight to solving client challenges. At Vodex.ai, he focuses on AI-powered outbound call solutions that boost sales, service, and marketing performance, while sharing his expertise through writing and research.

Listen on YouTube

Recap Video

Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.

Takeaways

  • Voice AI still hasn’t had its ChatGPT moment because people hate talking to bots that feel slow or robotic.

  • Latency is the deal breaker — anything slower than 300ms breaks the illusion of real conversation.

  • Cascading pipelines lose tone, emotion, and context, making bots sound flat and unreliable.

  • Speech-to-speech models are the real unlock, combining speed with emotional nuance.

  • Most voice AI agents are stitched together from ASR, LLM, TTS, and telco layers.

  • Vodex positions itself as the “Stripe of voice AI” with simple plug-and-play APIs.

  • Vertical focus matters, and collections is their strongest domain with strict FDCPA compliance.

  • Naturalness moves revenue, with one Arabic deployment lifting recovery from 45% to 81% in seven days.

  • Naturalness is not a “nice to have” — it directly drives revenue and customer trust.

  • The bar is rising fast; in two years robotic-but-functional bots will be unacceptable.

  • Proven sweet spots for voice AI right now: lead qualification, debt collection, healthcare scheduling, and follow-ups.

  • Vodex’s origin story shows the shift from slow custom builds to no-code, plug-and-play bots for non-technical users.

  • Context engineering and AI-on-AI testing are how they handle edge cases and reliability gaps.

  • The future of voice will run on small, task-specific speech models built for speed and accuracy.

  • Gen Z decision makers will push companies to embrace talking to systems instead of clicking around apps.

  • Vodex rejects cold-call spam, betting that contextual, consent-based conversations will define the industry.

  • Soon, every company will be expected to have a natural voice agent the same way every company is expected to have a website.

Discussion about this video

User's avatar