How Far is AI from the Nobel Prize? Top Models Fail in the CritPt Benchmark for Doctoral-Level Physics with Accuracy Below 10%
CritPt benchmark reveals top AI models like Gemini3Pro and GPT-5 still far from autonomous scientists, testing doctoral-level research skills over memorization.....