AI Three Titans Suffer a Setback: Latest Programming Test Accuracy Falls Below 25% Across the Board, GPT-5 Also Cannot Escape Misfortune
Top AI models GPT-5, Claude Opus4.1, and Gemini2.5 performed poorly in SWE-BENCH PRO, with solve rates below 25%, highlighting limitations in complex programming tasks.....