In the recently released SuperCLUE-VLM12 monthly multimodal visual language benchmark evaluation, global AI large models have shown intense competition. This evaluation was conducted from three core dimensions: basic cognition, visual reasoning, and visual application, comprehensively examining the "eyesight" and "brainpower" of various models.

In this competition, Google's
Additionally, Baidu ERNIE-5.0-Preview and Alibaba's Qwen3-vl also made it into the top five. Among them, Qwen3-vl became the first open-source model to break through the 70-point threshold on the list, making a significant contribution to the open-source community with its strong visual analysis capabilities.
By comparison, some international veteran models showed weaker performance. Claude-opus-4-5 from Anthropic scored 71.44 points, while OpenAI's GPT-5.2 (high) unexpectedly fell out of the first tier, ranking lower with only 69.16 points. This change in rankings marks that the competition in the multimodal AI field is entering a more intense phase.
Key Points:
🏆 Global Leader: Google's Gemini-3-pro won with 83.64 points, ranking first in all three indicators: basic cognition, visual reasoning, and application.
🇨🇳 Domestic Breakthrough: SenseTime's SenseNova and ByteDance's Douyin ranked second and third, showcasing China's strong competitiveness in visual understanding.
📊 Industry Reorganization: Qwen3-vl became the first open-source model to break 70 points, while GPT-5.2 (high) underperformed expectations in this visual evaluation, ranking at the back.