With the arrival of the college entrance examination, mathematics exams have once again become a "devil" in the hearts of candidates. In this competitive environment, six artificial intelligence models participated in the challenge. They are DouBao from ByteDance, YuanBao from Tencent, Tongyi from Alibaba, WenXin X1Turbo from Baidu, DeepSeek from Shendu Qiusuo, and o3 from OpenAI. This test adopted 14 objective questions from the 2025 New Curriculum Standard I Volume, with a total score of 73 points, covering single-choice questions, multiple-choice questions, and fill-in-the-blank questions.
To ensure fairness in the test, all models answered the questions without system prompts or internet search support, and each model could only answer once. After intense competition, the final results were unexpected. DouBao and YuanBao tied for first place with scores of 68, showcasing their excellent reasoning abilities. In comparison, DeepSeek and Tongyi performed slightly less impressively, finishing with scores of 63 and 62, respectively. The performance of WenXin X1 and o3 was disappointing, especially for o3, which only scored 34, indicating insufficient adaptability to domestic college entrance examination questions.
In terms of specific question types, DouBao, Tongyi, and YuanBao shone brightly in the single-choice questions, each scoring 35 points. DeepSeek scored 30 due to two mistakes, while o3 performed poorly, scoring only 20 in the single-choice questions, getting half of the questions wrong. In the multiple-choice questions, DouBao, DeepSeek, and YuanBao all performed perfectly, answering all three questions correctly, demonstrating strong stability.相对来说,Tongyi performed quickly but also made judgment errors at critical moments, leading to unsatisfactory scores.
This test not only revealed the potential and shortcomings of various AI models in college entrance examination mathematics but also reflected their progress in reasoning and reflection capabilities. Compared to last year, these models showed significant improvements in detail handling, formula application, and logical reasoning. Despite still having some errors and deficiencies, this competition undoubtedly laid a foundation for future AI mathematical capabilities.