Viqus Logo Viqus Logo
Home
Categories
Language Models Generative Imagery Hardware & Chips Business & Funding Ethics & Society Science & Robotics
Resources
AI Glossary Academy CLI Tool Labs
About Contact
Back to all news LANGUAGE MODELS

OpenAI’s gpt-realtime: Enhanced Voice AI Drives Enterprise Adoption

AI Voice OpenAI Realtime API Generative AI Voice AI LLMs NLP
August 28, 2025
Viqus Verdict Logo Viqus Verdict Logo 7
Voice Evolution, Not Revolution
Media Hype 8/10
Real Impact 7/10

Article Summary

OpenAI’s latest voice model, gpt-realtime, is aimed squarely at enterprise applications, particularly those leveraging voice AI. The model emphasizes improved instruction following, achieving a 30.5% score on the MultiChallenge audio benchmark, a significant increase from previous models. Key advancements include ‘more natural and expressive’ voices, the ability to handle complex instructions like speaking with specific accents, and seamless integration with the Realtime API. OpenAI has broadened the API's capabilities, adding support for Session Initiation Protocol (SIP) for contact center use cases, and image input recognition. Furthermore, the model incorporates enhanced function calling, enabling access to external tools, mirroring recent advancements in LLMs. The launch includes new voices – Cedar and Marin – and price reductions of 20% to $32 per million audio input tokens and $64 for audio output tokens. Competition is intensifying, with other providers like ElevenLabs and Hume also offering advanced voice models. The focus on practical, real-world scenarios, as evidenced by demonstrations with T-Mobile and Zillow, highlights the industry’s shift towards tangible applications.

Key Points

  • OpenAI’s gpt-realtime model prioritizes improved instruction-following capabilities, boosting accuracy and control.
  • The model offers ‘more natural and expressive’ voices, reflecting advances in AI voice generation technology.
  • Updates to the Realtime API, including support for SIP and image inputs, expand the model’s applicability across diverse enterprise workflows.

Why It Matters

The rise of gpt-realtime represents a critical step in the maturation of voice AI for business. While demonstrations are promising, the core value lies in the API’s expanded functionality – particularly SIP and the ability to interact with external tools – which is what will ultimately determine if these models transition from impressive prototypes to integral parts of real-world operational systems. This development underscores the increasing investment and competition within the voice AI market, suggesting that sophisticated, adaptable voice solutions are rapidly becoming a necessity for businesses looking to streamline customer interactions and automate workflows. For professional AI stakeholders, this signals a move beyond simple voice assistants to integrated, adaptable AI-powered platforms, demanding a shift in focus towards efficiency, security, and practical implementation.

You might also be interested in