In the Chinese multimodal vision language model evaluation benchmark (SuperCLUE-VLM) released on August 28, Gemini-2.5-Pro ranked first with a total score of 74.99, and OpenAI's GPT-5 (high) ranked second with a score of 68.59.

image.png

This benchmark builds an evaluation system around three core dimensions: basic cognition, visual reasoning, and visual application, tailored to the characteristics of Chinese scenarios, aiming to provide an objective and fair evaluation standard for the development of multimodal vision language models.

This evaluation covered a total of 15 multimodal models, including Claude-Opus-4.1, Gemini-2.5-Pro, GPT-5 (high), ERNIE-4.5-Turbo-VL, Doubao-Seed-1.6-thinking, hunyuan-t1-vision, Qwen-V1-Max-Latest, covering mainstream domestic and international models.

image.png

Finally, Gemini-2.5-Pro ranked first with a total score of 74.99, and OpenAI's GPT-5 (high) ranked second with a score of 68.59, while Baidu's ERNIE-4.5-Turbo-VL tied with other domestic models, showing strong market competitiveness.