OpenAI Unveils GPT-Realtime: A Leap Forward in Speech-to-Speech AI OpenAI has officially launched GPT-Realtime, its most advanced speech-to-speech model, and has moved its Realtime API out of beta with new enterprise-grade features. GPT-Realtime is engineered to understand and generate audio directly, which significantly reduces latency and preserves crucial nuances like emotion and laughter often lost in traditional text-to-speech systems. The new model showcases enhanced performance in reasoning, instruction following, and function calling. The updated Realtime API now supports image input, Session Initiation Protocol (SIP) for phone network connectivity, and Multi-Capability Pipelines (MCP) for pluggable features. OpenAI also introduced two new voice options, Cedar and Marin, and has cut the pricing for the Realtime API by 20%.
...