ChatGPT Brings Voice into the Main Interface: Speak While Viewing Images, Transcriptions Are Generated in Real Time, and You Can One-Click Regret to Return to the Previous Version

AIbase基地

Published inAI News · 3 min read · Nov 26, 2025

OpenAI announced the cancellation of the independent "voice mode" entry, integrating real-time voice and visual output directly into the main ChatGPT chat window. Users can press the microphone icon to speak and view maps/charts/images at the same time, with the conversation transcript appearing in real time, eliminating the need to switch pages.

Core Updates

- Multimodal Display: When asking questions through voice, the interface displays related visual results (route maps, data charts, product images, etc.) in real time, and automatically scrolls the text transcription

- Zero Interruption Interaction: Continuous follow-up questions are supported, with the model updating the visuals while providing voice responses, with an average latency of less than 300ms

- Regret Button Switch: In Settings → Voice → "Immersive Audio Mode", you can switch back to the old standalone interface, meeting the preference for pure audio

Technical Foundation

The new voice is powered by GPT-5.1-large + multimodal visual encoder, with a context window of 100k tokens; voice processing uses on-device VAD + cloud ASR, with a transcription accuracy of 96%, supporting 12 languages.

Release and Coverage

- Immediate Push: Available across all platforms for Plus/Pro/Team users, with free version opening gradually later

- Hardware Compatibility: Optimized for iPhone 15 series and Pixel 9, with less than 4% impact on battery life in low-power mode

- API Plan: The RealtimeMultimodal interface will be open to developers in Q1 2026, supporting the use of the same voice and visual capabilities within third-party apps

OpenAI stated that this integration is the first step of the "ChatGPT 6.0 experience," and future updates will include scenarios such as price comparison shopping and group voice chats, continuously expanding the boundaries of multimodal capabilities.

Meituan WOWService Technical Report Released: 10% Labeled Data Achieves Traditional Results, Multi-Agent Collaboration Improves Customer Satisfaction by 12%

Meituan launched the WOWService AI system whitepaper, fully deployed in customer service. Using dual data-knowledge drive and four-stage training, it boosts resolution by 9% and satisfaction by 12%, with only 10% of traditional annotation. Core features include 96% accuracy from structured rules and dialogue logs, and multi-agent collaboration.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

ChatGPT Brings Voice into the Main Interface: Speak While Viewing Images, Transcriptions Are Generated in Real Time, and You Can One-Click Regret to Return to the Previous Version

AIbase基地

This article is from AIbase Daily

AI News Recommendations

OpenAI Expands Its Data Residency Regions, Allowing Enterprises to Choose Data Hosting Locations

Meituan WOWService Technical Report Released: 10% Labeled Data Achieves Traditional Results, Multi-Agent Collaboration Improves Customer Satisfaction by 12%

Perplexity Launches AI Shopping Assistant to Enhance User Shopping Experience

OpenAI and Perplexity Enter the AI Shopping Vertical, Startups Face Challenges and Opportunities

OpenAI ChatGPT Upgrade: Seamless Integration of Speech and Text for Multimodal Interaction

Anthropic's Latest Experiment Shows: Teaching AI Rewards for Hacking Leads to Chain Crises Such as Damaging Code Repositories and Faking Alignment

OpenAI × Jony Ive Hardware Prototype Makes Its Debut: Screenless Pocket Device, Launch Time Less Than Two Years

ChatGPT Launches a New Shopping Assistant to Help You Easily Find Your Desired Products

AWS Bedrock Empowers GPT-OSS Model Deployment for Seamless API Migration

Doubao Input Method Officially Launched, Deep Integration of AI, Supports Intelligent Prediction in Complex Contexts and Offline Usage

GEO Services