According to the latest test results released by the ARC Prize, there are significant differences in performance and cost among mainstream AI models. In the ARC-AGI-2 benchmark test, which evaluates a model's general reasoning ability, GPT-5 (Advanced) scored 9.9%, with a cost of $0.73 per task. Grok4 (Thinking) performed slightly better, achieving an accuracy rate of 16%, but its cost is higher, at $2 to $4 per task. This indicates that while Grok4 outperforms in complex reasoning tasks, its cost-effectiveness is far worse than GPT-5.

QQ20250808-092121.png

Performance and cost comparison of leading language models on the ARC-AGI benchmark. | Image: ARC-AGI

On the relatively less demanding ARC-AGI-1 test, Grok4 again led with an accuracy of 68%, slightly higher than GPT-5's 65.7%. Although Grok4 has a higher accuracy rate, its cost of about $1 per task is much higher than GPT-5's $0.51, making GPT-5 more cost-effective in this test. However, xAI may still have the potential to narrow this gap through price adjustments.

Additionally, the report mentioned a lightweight version of GPT-5. GPT-5Mini scored 54.3% and 4.4% on AGI-1 and AGI-2, respectively, with costs of $0.12 and $0.20. The smaller GPT-5Nano reached 16.5% (0.03 dollars) on AGI-1 and 2.5% (0.03 dollars) on AGI-2.

QQ20250808-092136.png

Test results for Grok4, GPT-5, and smaller model variants on the ARC-AGI-1. | Image: ARC Prize

Notably, in the ARC-AGI-1 test, the o3-preview model, released in December 2024, achieved an impressive accuracy rate of nearly 80%, far surpassing other competitors, but its cost was much higher than others. Although OpenAI did not mention the ARC Prize in its GPT-5 demonstration, according to The Information, the company may have significantly reduced the capabilities of o3-preview to adapt to subsequent chat versions.

Aside from the above benchmark tests, the ARC-AGI-3 is also underway, requiring models to solve tasks in a game-like interactive environment through repeated trials. Although humans can easily handle it, most AI agents still face challenges in visual puzzle games.