In 2022, as ChatGPT swept the globe, a small team within OpenAI called MathGen was quietly working on a more fundamental research: teaching AI models to perform mathematical reasoning. Today, this work has become a core technology for OpenAI's AI agents and has sparked a talent war in Silicon Valley. This article will delve into OpenAI's journey with AI agents, exploring how it has moved from a low-key research project to an ambitious vision of general agents through reinforcement learning and computational breakthroughs.

In 2022, when ChatGPT quickly became popular due to its powerful language capabilities and became one of the fastest-growing products in history, researcher Hunter Lightman was focused on a completely different task: leading a team called MathGen to teach OpenAI's models to solve high school math competition problems.

OpenAI

At that time, OpenAI's models struggled with mathematical reasoning. However, this project, seen as basic research, laid the foundation for the company's later breakthroughs. Now, the team's achievements have become the core technology behind the industry-leading AI reasoning model, known as AI agents.

OpenAI CEO Sam Altman described an ambitious vision at the company's first developer conference in 2023: "Eventually, you just need to tell the computer what you need, and it will do everything for you." He was referring to AI agents, an AI system capable of performing complex tasks on a computer like a human.

The Revival of Reinforcement Learning: From AlphaGo to the o1 Model

The path of OpenAI's AI agents is closely linked to a training technique called reinforcement learning (RL). Although RL gained fame in 2016 when Google DeepMind's AlphaGo defeated the world Go champion, OpenAI's breakthrough lay in combining it with large language models (LLMs).

Early GPT series models from OpenAI were good at text processing but struggled with basic math. It wasn't until 2023 that the OpenAI team achieved a breakthrough called "Strawberry." This technology combined LLMs, reinforcement learning, and "test-time computation," which provided the model with additional time and computing power to plan, verify, and solve problems. This breakthrough enabled OpenAI to introduce the "chain of thought" (CoT) method, significantly improving the model's performance in solving unknown math problems.

As researcher El Kishky described, "I saw the model start to reason. It noticed mistakes and then traced back. It also felt frustrated. It felt like reading someone's mind."

This combination of technologies eventually led to the creation of OpenAI's reasoning model, o1. The planning and fact-checking abilities of o1 provided a solid foundation for building powerful AI agents. Lightman said the creation of o1 "solved a problem that had troubled me for years," and it was one of the most exciting moments in his research career.

The Value of o1 and the Talent War

In the fall of 2024, OpenAI released the o1 model, shocking the world. This breakthrough proved that performance could be further improved through new training methods. Within less than a year, the 21 researchers behind o1 became the most sought-after talents in Silicon Valley.

Mark Zuckerberg successfully recruited five o1 researchers to join Meta's newly established super intelligence department, offering them over $100 million in compensation, including Zhao Shengjia, who was appointed as the chief scientist of the laboratory. This move highlights the strategic importance of AI reasoning models in the current technological race.

The Future of AI Agents: From Coding to Subjective Tasks

Although OpenAI's models have won gold medals in international math olympiads, the latest AI systems still produce hallucinations, and their agents still face challenges when executing complex tasks.

Currently available AI agents, such as OpenAI's Codex, are best suited for well-defined, verifiable fields like coding. However, they still struggle with complex and subjective tasks such as shopping or finding parking spaces.

OpenAI researcher Noam Brown said the company is exploring new general reinforcement learning techniques to address these hard-to-verify tasks. In this way, OpenAI created a model that can win gold in math competitions. This model can generate multiple "agents," explore various ideas, and ultimately choose the best answer. Companies like Google and xAI have also started using similar technologies.

OpenAI hopes to further solidify its leadership in the AI field with upcoming models like GPT-5. El Kishky said that OpenAI's ultimate goal is to create AI agents that can intuitively understand user intentions without cumbersome settings.