GPT-4o Voice Mode Fully Upgraded: Singing Function Released, Entering a New Realm of AI Interaction

AIbase基地

Published inAI News · 7 min read · May 27, 2025

OpenAI's advanced voice mode of GPT-4o has undergone significant updates recently. It can now engage in more natural voice interactions and has added the impressive "singing" function. Although the current singing performance still appears somewhat immature, this breakthrough undoubtedly opens up new possibilities for AI's multimodal interaction capabilities. AIbase consolidates the latest information to analyze the recent developments and potential of GPT-4o's voice mode.

Singing Function Launched: AI Can Also “Sing”

The latest news shows that GPT-4o’s advanced voice mode now supports the singing function. Users can request AI to sing songs through voice commands, including some copyrighted tracks. This function allows GPT-4o to generate melodies, lyrics, or imitate specific styles of singing according to user needs, adding fun to the interactive experience. Although the "performance" still needs optimization, AIbase observes that the addition of this function marks a new attempt by GPT-4o in the field of audio generation.

Multimodal Interaction Upgraded: More Natural and Emotional

GPT-4o's advanced voice mode is renowned for its end-to-end voice processing capability. Compared to traditional voice modes (which rely on converting speech to text before generating speech), the new mode directly processes audio input, significantly reducing response delays, averaging only 320 milliseconds. Additionally, GPT-4o can capture non-verbal cues such as speaking speed and tone, and respond with richer emotional voices. It even supports users interrupting conversations at any time, providing a natural conversation experience close to human interaction.

Feature Highlights: All-Round Mastery of Laughter and Crying

Besides singing, GPT-4o's advanced voice mode can also generate laughter, crying, and other emotional expressions based on instructions, further enriching interaction scenarios. For example, users can ask AI to respond in a dramatized, humorous, or specific character's tone, such as mimicking the voice of an animated character or celebrity. This flexibility gives it great potential in entertainment, education, and creative content generation fields.

Current Limitations: Singing Still Needs Refinement

Although the singing function has been added, GPT-4o's singing performance has not yet reached professional standards. During testing, AI may appear less fluid when handling complex melodies or high notes, and some users have reported that its voice quality compared to other AI voice models (such as Pi AI or Siri) seems slightly inferior, with lower sampling rates leading to slight compression of sound quality. OpenAI stated that the addition of the singing function aims to explore the boundaries of audio generation, and its performance will be continuously optimized in the future.

Security and Copyright Considerations: Limited Innovation

To respect copyrights, OpenAI has set strict filtering mechanisms for GPT-4o's voice output, limiting its generation of copyrighted music content. However, recent information shows that some users have successfully made AI sing copyrighted songs, triggering discussions about copyright boundaries. Moreover, GPT-4o has a high rejection rate in certain audio tasks (such as automatic singing scoring or voice synthesis), possibly due to avoiding the generation of unauthorized content or lacking objective standards.

A New Chapter for Voice AI

GPT-4o's advanced voice mode update, especially the addition of the singing function, marks continuous breakthroughs by OpenAI in the field of multimodal AI. Although the current singing performance needs improvement, its low latency, natural interaction, and emotional expression capabilities are already significantly ahead of traditional voice assistants like Siri and Alexa. AIbase believes that as OpenAI continues to optimize sound quality and copyright processing mechanisms, GPT-4o has the potential to spark a new wave of applications in education, entertainment, and customer service fields.

Conclusion

GPT-4o's advanced voice mode singing function injects more fun and possibilities into AI interaction. Despite the need for technological refinement, its innovative significance cannot be ignored. From low-latency dialogues to emotional expression, GPT-4o is redefining the boundaries of human-computer interaction.

Willow Smart Voice Input Method Secures $4.2 Million in Seed Funding, Aiming for a New Future in Voice Operating Systems

Willow, a voice OS startup, raised $4.2M in angel funding for R&D. Its TNT tech combines deep learning & NLP to precisely parse complex commands. The pivoted medical team iterated 10x before focusing on voice. Competing with giants, Willow adopts open-source modular strategy with API plans. Challenges remain in multilingual support & privacy.....

ByteDance Seed's Latest Reinforcement Learning Recipe POLARIS Open Sourced, 4B Model's Mathematical Reasoning Approaches 235B Performance

Recently, the ByteDance Seed team collaborated with the University of Hong Kong and Fudan University to introduce an innovative reinforcement learning training method called POLARIS. This method successfully enhances the mathematical reasoning capabilities of small models to levels comparable to those of large models through a carefully designed Scaling RL strategy, offering a new approach for optimizing small models in the field of artificial intelligence. Experimental results show that the 4 billion parameter open-source model Qwen3-4B trained using POLARIS achieved remarkable performance on AIME25 and AIME24 mathematical tests.

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

Kimi K2 excels in creative writing, outperforming o3-Pro in short story creation. This open-source model by Moonshot (Ali-backed) shows strengths in literary compression and metaphor innovation, with some works near publishable quality. Its low cost ($0.15/M tokens) and precise instruction-following attract developers, though emotional depth and multilingual performance need improvement. This breakthrough sets new AI writing standards.....

xAI Launches New Features! Grok Web Version Voice Mode Opens, Challenging ChatGPT to New Heights

xAI launches Grok Voice for Web with 5 voice options (Ara/Rex/Eve/Sal/Gork) and screen sharing, expanding from mobile to web for better office use. Despite early bugs, unique features give it a competitive edge. Basic functions are free; premium may require subscription. Future plans include coding models and video capabilities.....

Amazon Launches AI Code Editor Kiro, Supporting Free Use of Claude 4/3.7 Sonnet

Amazon AWS launches a new AI development tool called Kiro, focusing on the concept of specification-driven development. The tool is based on the open-source Code OSS platform and is compatible with the VS Code ecosystem. It uses AI collaboration to first generate requirement documents and system designs, then automatically generates code, test cases, and documentation, ensuring code quality. Kiro supports multimodal input and automated testing features. It is currently available for free preview, and a paid version will be released in the future. Its specification-driven development model has the potential to address maintenance challenges with AI-generated code, but the initial usage may be complex.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

GPT-4o Voice Mode Fully Upgraded: Singing Function Released, Entering a New Realm of AI Interaction

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Kimi-2 Has Been Launched on LiveBench AI: The New Open-Source AI Champion Exceeds GPT-4.1

Willow Smart Voice Input Method Secures $4.2 Million in Seed Funding, Aiming for a New Future in Voice Operating Systems

ByteDance AI Programming Tool TRAE2.0 to be Released, Adds Voice Interaction Feature

ByteDance Seed's Latest Reinforcement Learning Recipe POLARIS Open Sourced, 4B Model's Mathematical Reasoning Approaches 235B Performance

Willow Voice secures 4.2 million USD in funding AI voice input redefines efficient work experience

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

TRAE Launches Kimi-K2 Model Service International Version Supports Grok-4 (Beta) Function Upgrade

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

xAI Launches New Features! Grok Web Version Voice Mode Opens, Challenging ChatGPT to New Heights

Amazon Launches AI Code Editor Kiro, Supporting Free Use of Claude 4/3.7 Sonnet