Building on the GPT-4o model released last year, OpenAI has made significant updates to its advanced voice mode, making voice communication feel more natural and closer to human conversation. This advanced feature relies on a native multimodal model, which can quickly respond to audio inputs, reacting as fast as 232 milliseconds and averaging a response time of 320 milliseconds—almost on par with human conversation speed.

Earlier this year, OpenAI made minor updates to this voice mode, improving interruptions and accent handling. This major upgrade has made voice responses even more nuanced in tone and more natural in rhythm, especially in handling pauses and emphasis, making interactions more vivid. Additionally, the updated system can express various emotions more accurately, including empathy and sarcasm, adding more humanity to machine-to-human interaction.

ChatGPT OpenAI Artificial Intelligence (1)

What's more exciting is that this update also introduces a translation function. ChatGPT users only need a simple command for it to translate conversations in real-time until they receive a stop command. The introduction of this feature will undoubtedly reduce the need for dedicated voice translation applications, further enhancing user experience. For now, the updated advanced voice mode is available only to paying users.

Despite these significant improvements in voice interaction quality, OpenAI clearly states that there are still known limitations with this update. For instance, audio quality may slightly decrease in some cases, and tone and intonation might change unexpectedly, particularly noticeable in certain voice options. Occasionally, there may be inconsistencies with real conversations, such as unexpected sounds like ads, gibberish, or background music. OpenAI says they will continue working to improve audio consistency and gradually address these issues.

This upgrade not only makes AI voice interaction feel more natural but also lays a stronger foundation for communication between humans and artificial intelligence.