OpenAI releases new speech model GPT-Realtime designed for speech AI Agent

AIbase基地

Published inAI News · 5 min read · Aug 29, 2025

OpenAI held a technical live stream at 1 AM and officially launched its new speech model - GPT-Realtime. This multimodal model is designed for speech AI Agents, aiming to generate more natural and smooth speech, capable of imitating the rich and diverse tones, emotions, and speech rates of humans. GPT-Realtime has a wide range of application scenarios, covering areas such as customer service, education, finance, and healthcare, providing strong support for creating intelligent voice assistants.

GPT-Realtime introduces two unique speech styles - Marin and Cedar, and comprehensively upgrades the original eight speech styles. Unlike traditional speech models, GPT-Realtime can not only generate speech but also has intelligence, reasoning, and understanding capabilities. For example, the model can accurately capture non-verbal signals such as laughter and switch languages flexibly in conversations to adapt to different scenario needs.

In terms of evaluation, GPT-Realtime has significantly improved the accuracy of letter and number sequence detection in multiple language environments, with an accuracy rate of up to 82.8% in reasoning ability assessments, making it a leader among current intelligent speech models. The improvement in instruction following capability is also a major highlight of this model. Developers can customize instructions to enhance the model's response effectiveness. In the MultiChallenge audio benchmark test, GPT-Realtime's instruction following accuracy increased from 20.6% to 30.5%.

Aside from speech generation capabilities, GPT-Realtime also supports image input. Developers can combine images with audio or text in conversations, allowing the model to engage in dialogue based on what the user sees, providing a more personalized interactive experience. Additionally, the new features of Realtime API allow developers to easily connect to remote MCP servers, simplifying the integration process and improving development efficiency.

In terms of security and privacy, Realtime API is equipped with multiple layers of protection measures, monitoring conversation content in real-time to prevent abuse. At the same time, developers can add custom security protection as needed to ensure the safety of the usage environment.

From the day of release, all developers can use the new Realtime API and GPT-Realtime model. The price of audio input tokens has been reduced by 20%. Additionally, developers can flexibly set smart token limits to reduce the cost of long conversations.

Key Points:
🌟 GPT-Realtime is OpenAI's latest multimodal speech model, suitable for areas such as customer service and education.
📈 The model has significant improvements in reasoning ability and instruction following accuracy, providing stronger support for developers.
🔒 Realtime API is equipped with security protection measures, ensuring the safety and privacy of user interactions.

NetEase Youdao Document Translation Function Now Free for All Users, Enhanced by the Ziyue Education Large Model to Improve Multilingual Communication Efficiency

On August 28, 2025, NetEase Youdao announced that its powerful document translation function is now officially available free of charge to all users. This move aims to provide users with a more efficient and accurate multilingual translation experience, especially in professional fields such as finance and economics, computer science, and medicine. Key Highlights: Enhanced by the Education Large Model, Translation Quality Has Significantly Improved. The document translation function now being made free includes the self-developed "Ziyue" education large model by NetEase Youdao. This model supports mutual translation among eight languages and claims to achieve world-leading results through optimized algorithms.

NetEase Cloud Music Launches AI Music Recommendation Feature for Easy Customization of Personalized Playlists

NetEase Cloud Music has announced the launch of a new 'AI Recommendation' feature, allowing users to easily create personalized playlists. With this feature, users can simply search for 'AI Recommendation' in the NetEase Cloud App to quickly find and use the service. The key highlight of this feature is that users only need to input a sentence describing their needs, such as 'Workplace Energy Boost' or 'K-pop Songs from Heartfelt Playlists,' and the NetEase AI Recommendation will generate corresponding playlists in real-time based on the user's music style, era, and preference data. Users can also

OpenAI Unveils Major Update! GPT-Realtime Speech Model Launches, Supports Image Input - AI Interaction Is About to Go Rogue!

OpenAI officially launched its latest speech model, GPT-Realtime. This multimodal speech agent model has sparked industry discussion with its powerful reasoning capabilities, support for image input, and optimized command following functionality. According to the latest information from AIbase, GPT-Realtime not only achieves breakthroughs in speech interaction, but also provides developers with a smarter and more flexible speech agent solution by integrating features such as image input, remote MCP, and SIP phone calls. GPT-Real

Tencent Yuanbao Joins WeChat Video Comments Section: AI Chat Partner Upgraded, Empowering Efficient Interaction

Recently, Tencent's artificial intelligence assistant "Tencent Yuanbao" has officially joined the comments section of WeChat Video, offering users a new interactive experience. This feature is currently in gray-scale testing. When browsing videos, users can simply @Tencent Yuanbao in the comments section to receive real-time Q&A, summaries, and suggestions related to the video content, making the comments section not just a place for user interaction, but also an efficient information acquisition platform. The entry of Tencent Yuanbao greatly improves users' efficiency in obtaining information. When watching content-rich videos, Yuanbao can quickly summarize...

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

Building and Deploying AI

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

OpenAI releases new speech model GPT-Realtime designed for speech AI Agent

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Unexpected Performance! Alibaba Cloud Grows by 26% and Leads the Way, AI Revenue Continues to See Triple-Digit Growth for 8 Consecutive Quarters

Yao Xin from PPIO: The PDA Thinking Essential for AI Entrepreneurs to Drive the Global Intelligent Revolution!

Billionaire Dan Loeb Sells Stake in TSMC, Invests in Another Trillion-Dollar Artificial Intelligence Company

Baidu Search AI Assistant Fully Launches with the Super Fast Model, Search Result Generation Speed Significantly Improved

AI Daily: Hailuo AI's First and Last Frame Feature Launches; Yuan Shi Technology Releases Wenti Bai 5; OpenAI Releases New Speech Model GPT-Realtime

NetEase Youdao Document Translation Function Now Free for All Users, Enhanced by the Ziyue Education Large Model to Improve Multilingual Communication Efficiency

NetEase Cloud Music Launches AI Music Recommendation Feature for Easy Customization of Personalized Playlists

OpenAI Unveils Major Update! GPT-Realtime Speech Model Launches, Supports Image Input - AI Interaction Is About to Go Rogue!

Apple Xcode Integrates Claude Sonnet 4: The AI Revolution Era for iOS Development Has Arrived

Tencent Yuanbao Joins WeChat Video Comments Section: AI Chat Partner Upgraded, Empowering Efficient Interaction