New Benchmark for AI Evaluation! GPT-5 and Other Cutting-Edge Models Score Zero Points. What Is the Level of Doctor-Level Reasoning?
FormulaOne AI benchmark draws attention as top models like GPT-5 and Grok4 score zero. Developed by AAI, it includes 220 graph-based dynamic programming problems across complex fields like topology and combinatorics, ranging from medium to research-level difficulty.....