When GPT-5.2 first surpassed the human average in a rigorous general intelligence test, it marked a pivotal moment for the AI community, bringing both excitement and caution. Recently, co-founder of OpenAI, Greg Brockman, announced that the system Poetiq (GPT-5.2X-High), built on GPT-5.2, achieved a 75% accuracy rate in the latest version of the ARC-AGI-2 benchmark test, significantly higher than the human average of 60%. This breakthrough not only set a new record but also directly addressed the long-standing "performance paradox" of large models—performing exceptionally well in standard tests but frequently failing in real-world applications.

ARC-AGI-2 (Abstraction and Reasoning Corpus for Artificial General Intelligence-Version 2) was introduced by François Chollet's team in 2025. Its design philosophy is extremely pure: no test preparation, only testing real reasoning. The benchmark does not provide a training set; each question is a new, unseen abstract task, requiring AI to infer rules, transfer knowledge, and complete reasoning like humans, by observing a few examples. This means any model relying on memory or statistical fitting will fail here—it is specifically designed to test "genuine general intelligence."
The system that topped this time was not an official model from OpenAI but a "meta-system" developed by a startup called Poetiq. Poetiq did not retrain GPT-5.2 but used a sophisticated software architecture to automatically schedule, combine, and guide existing large models through complex reasoning processes. The results were astonishing: without changing the base model, the system performance jumped from nearly human level at 60% to 75%, with a cost of less than $8 per question. In comparison, Gemini3Deep Think (Preview), which emphasizes "deep thinking," only scored 46% with higher costs.

This 15-point leap reveals a key trend: **The next ceiling for AI lies not in computing power stacking, but in system design and human-computer collaboration**. At this moment, OpenAI officially posted its 2026 strategic forecast on X platform, clearly introducing the concept of "capability overhang"—the current large models can do much more than what people actually use them for. Models now have doctoral-level professional capabilities but are still used as advanced search engines; companies purchase AI but have not restructured any workflow.
OpenAI is thus shifting its focus to the application layer: in 2026, it will invest heavily in system integration for medical, business, and daily scenarios, emphasizing "teaching people to use AI" and "integrating AI into processes." As the community has热议, "The real challenge is not that AI is not strong enough, but that organizations are unwilling to change." Poetiq's success proves that with excellent system engineering, the potential of existing models can be doubled.
GPT-5.2 surpassing humans is not the end, but the beginning. It marks the end of the "parameter-only" era and opens a new competition centered on system intelligence, process reengineering, and human-machine symbiosis. The winners of the future may not be the companies with the largest models, but those who best weave AI into the fabric of human life.




