OpenAI is adopting a new "fighting fire with fire" strategy to enhance the security of its代理式 web browser, ChatGPT Atlas. To address increasingly complex cyber threats, OpenAI has developed an "automated attacker" system that simulates real hacker attack methods to conduct round-the-clock stress tests on ChatGPT Atlas.
The core of this system lies in combating prompt injection attacks. In such attacks, malicious third parties secretly send instructions to the AI agent, prompting it to perform actions against the user's will, such as forwarding sensitive emails or deleting cloud files without the user's knowledge. AIbase learned that OpenAI's "automated attacker" uses advanced reinforcement learning techniques to autonomously discover new attack paths that human red teams may have overlooked.
In a real demonstration, this AI attacker successfully simulated a scenario where Atlas was induced to send a resignation letter to the CEO of a company. Although Atlas's defense mechanisms ultimately intercepted the request and alerted the user, OpenAI admits that the security battle is a long-term war. Since agent-based browsers need deep integration into users' digital lives (such as accessing emails and calendars), their convenience itself also creates a larger risk exposure surface.


