AI Scholar Suffers a Setback! GPT-4o Expert Exam Scored Only 2.7 Points

AIbase基地

Published inAI News · 4 min read · Feb 3, 2026

Recent results from a test called "The Ultimate Human Exam" (HLE) have led us to reevaluate the true capabilities of AI. According to a report in Nature, GPT-4o scored only 2.7 points out of 100 in this test, which included 2,500 questions created by global experts. The best-performing AI model only achieved 8 points. These results raise doubts about whether the power of AI is real strength or just an illusion of success.

Traditional AI tests are increasingly unable to reflect true abilities, mainly for two reasons. First, "benchmark saturation," where AI systems have memorized common test questions, making scores unrelated to actual understanding. Second, "answer cheating," as many test answers can be directly found online, making AI appear to answer correctly but actually relying on retrieval and memory rather than genuine reasoning ability.

To address these issues, the designers of HLE gathered nearly 1,000 experts from 50 countries, ensuring that each question required deep professional knowledge and significantly increased difficulty. HLE questions cover multiple fields such as mathematics, physics, and chemistry, and include strict review processes to ensure the difficulty level is high enough to be challenging for AI. For example, math problems require in-depth logical reasoning, while chemistry problems involve complex reaction mechanisms—answers cannot be obtained simply through retrieval.

The test results are clear: GPT-4o scored only 2.7 points, while Claude 3.5 Sonnet and Gemini 1.5 Pro achieved only 4.1% and 4.6% accuracy, respectively. The best-performing model, o1, only reached 8%. These data clearly show that even the latest generation of AI still struggles when facing truly challenging questions requiring deep professional knowledge.

Through the HLE test, we can see a sharp contrast between the true capabilities of AI and the high scores seen in traditional benchmark tests. This also prompts us to reconsider whether AI is as smart as we imagine or if it's merely an illusion of success.

OpenAI Announces the Discontinuation of Multiple Models Including GPT-4o, Users Shift to Next-Generation Technology

OpenAI announced the discontinuation of older models including GPT-4o, marking the end of its historical mission. GPT-4o was previously praised for its conversational style and multimodal capabilities, but the company has now shifted its focus to the next-generation flagship model, with GPT-5.2 becoming the preferred choice for users.

Saying Goodbye to GPT-4o: OpenAI Announces Discontinuation of Several Classic Large Models

OpenAI announced that it will discontinue several early models starting next month, including GPT-4o, which is favored by paying users. The model was launched in May 2024 and was popular with users for its conversational style. Although it was briefly taken offline after the release of GPT-5, it was later restored upon the CEO's promise. The retirement may be due to declining usage, and OpenAI will guide users toward newer models.

AI Chip Shortage: Samsung's Operating Profit Rises 200% in Q4 2025, Setting a New Historical High

Samsung Electronics posted strong 2025 results, driven by surging chip demand from the global AI race. Full-year operating profit rose 33.2% to KRW 43.6 trillion, with sales up 10.9% to KRW 333.6 trillion and net profit increasing 31.2% to KRW 45.2 trillion. Q4 performance was particularly strong, with operating profit hitting a record high.....

End of the Flagship Sedan Era! Tesla's Q4 Revenue Reaches 24.9 Billion, Fremont Factory Transforms into a Robot Production Facility

Tesla announced the discontinuation of Model S and Model X, and will transform the Fremont factory into a robot production facility, fully shifting its focus to the artificial intelligence field, marking the company's strategic transformation from an automotive manufacturer to a "physical AI company".

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

AI Scholar Suffers a Setback! GPT-4o Expert Exam Scored Only 2.7 Points

AIbase基地

This article is from AIbase Daily

AI News Recommendations

The GPT Moment for Agricultural AI: Carbon Robotics Launches Large Plant Model LPM

OpenAI Launches New Codex macOS Application, Challenging New Heights in Programming Efficiency!

OpenAI Announces the Discontinuation of Multiple Models Including GPT-4o, Users Shift to Next-Generation Technology

Valuation Surge to $83 Billion: OpenAI Plans IPO in Q4 of This Year

Saying Goodbye to GPT-4o: OpenAI Announces Discontinuation of Several Classic Large Models

Gemini 3.5 Snow Bunny Leak: Generate 3000 Lines of Code in One Click, Outperforming GPT-5.2

AI Chip Shortage: Samsung's Operating Profit Rises 200% in Q4 2025, Setting a New Historical High

End of the Flagship Sedan Era! Tesla's Q4 Revenue Reaches 24.9 Billion, Fremont Factory Transforms into a Robot Production Facility

OpenAI Executive Predicts: 2026 Will Be the Breakthrough Year When AI Completely Reshapes Scientific Research

OpenAI Releases Prism: A Scientist-Focused AI-Native Workspace Based on GPT-5.2

AI News Recommendations

The GPT Moment for Agricultural AI: Carbon Robotics Launches Large Plant Model LPM

OpenAI Launches New Codex macOS Application, Challenging New Heights in Programming Efficiency!

OpenAI Announces the Discontinuation of Multiple Models Including GPT-4o, Users Shift to Next-Generation Technology

Valuation Surge to $83 Billion: OpenAI Plans IPO in Q4 of This Year

Saying Goodbye to GPT-4o: OpenAI Announces Discontinuation of Several Classic Large Models

Gemini 3.5 Snow Bunny Leak: Generate 3000 Lines of Code in One Click, Outperforming GPT-5.2

AI Chip Shortage: Samsung's Operating Profit Rises 200% in Q4 2025, Setting a New Historical High

End of the Flagship Sedan Era! Tesla's Q4 Revenue Reaches 24.9 Billion, Fremont Factory Transforms into a Robot Production Facility

OpenAI Executive Predicts: 2026 Will Be the Breakthrough Year When AI Completely Reshapes Scientific Research

OpenAI Releases Prism: A Scientist-Focused AI-Native Workspace Based on GPT-5.2

GEO Services