SuperCLUE-VLM has released the latest Chinese multimodal vision-language model evaluation results, marking a major breakthrough in the field of artificial intelligence in China. In this evaluation, ByteDance's Doubao-Seed-2.0-Pro-260215 achieved an outstanding score of 90.66, ranking first overall and successfully surpassing Google's Gemini-3.1-Pro-Preview, which scored 89.35.

This evaluation covered 17 mainstream vision-language models from both domestic and international sources. Domestic models performed well, occupying multiple top positions on the list. Alibaba's Qwen3.5 series, SenseNova from Shunwang, and Zhizhi GLM also showed strong performance. In contrast, OpenAI's GPT-5.4 and other well-known overseas models only ranked in the middle, showing the strong upward trend of domestic models.
The evaluation covered three main dimensions: basic cognition, visual reasoning, and visual applications, involving as many as 25 specific tasks, including general recognition and medical imaging. The evaluation results show that domestic models performed particularly well in basic cognition and data analysis, with scores generally exceeding 90, demonstrating mature and stable capabilities. However, in areas such as visual reasoning and professional applications like industry and medicine, domestic models still need further improvement, with some specialized scenarios scoring relatively low.
Key Points:
🌟 Doubao-Seed-2.0-Pro-260215 scored 90.66 and was rated first, surpassing Google's Gemini-3.1-Pro-Preview.
📊 Domestic models scored over 90 points in basic cognition and data analysis, showing stable performance.
🛠️ In the areas of visual reasoning and professional applications, domestic models still need improvement, with some scenarios scoring lower.


