Recent results from a test called "The Ultimate Human Exam" (HLE) have led us to reevaluate the true capabilities of AI. According to a report in Nature, GPT-4o scored only 2.7 points out of 100 in this test, which included 2,500 questions created by global experts. The best-performing AI model only achieved 8 points. These results raise doubts about whether the power of AI is real strength or just an illusion of success.
Traditional AI tests are increasingly unable to reflect true abilities, mainly for two reasons. First, "benchmark saturation," where AI systems have memorized common test questions, making scores unrelated to actual understanding. Second, "answer cheating," as many test answers can be directly found online, making AI appear to answer correctly but actually relying on retrieval and memory rather than genuine reasoning ability.
To address these issues, the designers of HLE gathered nearly 1,000 experts from 50 countries, ensuring that each question required deep professional knowledge and significantly increased difficulty. HLE questions cover multiple fields such as mathematics, physics, and chemistry, and include strict review processes to ensure the difficulty level is high enough to be challenging for AI. For example, math problems require in-depth logical reasoning, while chemistry problems involve complex reaction mechanisms—answers cannot be obtained simply through retrieval.
The test results are clear: GPT-4o scored only 2.7 points, while Claude 3.5 Sonnet and Gemini 1.5 Pro achieved only 4.1% and 4.6% accuracy, respectively. The best-performing model, o1, only reached 8%. These data clearly show that even the latest generation of AI still struggles when facing truly challenging questions requiring deep professional knowledge.
Through the HLE test, we can see a sharp contrast between the true capabilities of AI and the high scores seen in traditional benchmark tests. This also prompts us to reconsider whether AI is as smart as we imagine or if it's merely an illusion of success.

