OpenAI officially launched ChatGPT Agent, a new AI tool that marks a major leap in artificial intelligence from a conversational assistant to an autonomous task executor. The ChatGPT Agent integrates the previously released Operator and Deep Research features, allowing it to independently complete complex tasks through a virtual browser, terminal, and API, saving users time and improving efficiency.
Core Features: From Conversation to Action
ChatGPT Agent is no longer limited to text conversations; it can browse, click, fill out forms, and even execute code and call APIs on the web, just like humans. It can handle a variety of tasks, such as selecting wedding outfits within a budget and style, planning travel itineraries, generating professional reports, or creating presentations. OpenAI stated that the agent is driven by the GPT-4o model, combining the web interaction capabilities of Operator and the deep research functionality of Deep Research, creating a unified intelligent system. Users only need to provide a single instruction, and the Agent can autonomously complete multi-step tasks, significantly enhancing productivity.
Performance: Exceeding Industry Standards
ChatGPT Agent has demonstrated leading performance in multiple benchmark tests. In the "Humanity’s Last Exam" test, its accuracy reached 41.6%, far exceeding the previous OpenAI o3 model's 20.3% and Deep Research's 26.6%. In investment banking modeling tasks, the Agent achieved an average accuracy of 71.3%, outperforming competitors like Microsoft Co-pilot in Excel and PowerPoint-related tasks. Additionally, its performance in web navigation tasks such as BrowseComp and WebArena reached 68.9% and 65.4%, respectively, showing strong practicality.
Safety and Limitations: User Control is Central
OpenAI emphasized that safety is a priority in the design of ChatGPT Agent. When performing high-consequence operations involving passwords or payments, the Agent will request explicit user authorization and allow users to pause, interrupt, or take over tasks at any time. To prevent malicious websites or prompt injection attacks, OpenAI has implemented strict protective measures, including limiting sensitive actions (such as bank transfers) and automatically deleting browsing data. Additionally, the Agent is classified as a "high bio and chemical" capability level, triggering additional security safeguards.
Availability and Future Plans
Currently, ChatGPT Agent is available to ChatGPT Pro, Plus, and Team users. Pro users get a monthly quota of 400 tasks, while Plus and Team users receive 40 tasks, with additional task quotas available for purchase beyond the limit. OpenAI plans to expand access to enterprise and educational users in the coming weeks. However, this feature is not yet available in the EU and Switzerland. OpenAI also revealed that the Agent could be the foundation for a more powerful model, such as the rumored GPT-5, and future updates may integrate more features, such as payment settlement systems.
ChatGPT Agent's release comes amid intense competition in the AI industry. Microsoft's Co-pilot, Google's Gemini, and xAI's Grok are all competing for dominance in digital productivity interfaces. By launching the Agent, OpenAI not only reinforces its leadership in the generative AI field but also challenges traditional search engines and office software. Industry experts believe that ChatGPT Agent may redefine how users interact with the web and productivity tools, becoming a new benchmark for AI-driven automation.
AIbase believes that the launch of ChatGPT Agent represents a key step for OpenAI moving from conversational AI to full automation. Although executing complex tasks may take 15-30 minutes, it significantly improves efficiency compared to manual operations. In the future, with technological optimization and expanded features, ChatGPT Agent is expected to become a valuable assistant for both enterprises and individuals. However, data privacy and security issues still require ongoing attention. AIbase will continue to track the development of this product and provide readers with the latest insights.