In the recently released SuperCLUE-VLM12 monthly multimodal visual language benchmark evaluation, global AI large models have shown intense competition. This evaluation was conducted from three core dimensions: basic cognition, visual reasoning, and visual application, comprehensively examining the "eyesight" and "brainpower" of various models.

image.png

In this competition, Google's Gemini-3-pro took first place with an absolute advantage of 83.64 points. Detailed data shows that it achieved comprehensive leadership in all three sub-indicators, demonstrating the profound strength of an international top-tier model. As for domestic models, their performance was also impressive. SenseNova V6.5Pro by SenseTime ranked second with 75.35 points, while ByteDance's Douyin visual version secured third place with 73.15 points. Notably, Douyin outperformed some international strong competitors in basic cognition.

Additionally, Baidu ERNIE-5.0-Preview and Alibaba's Qwen3-vl also made it into the top five. Among them, Qwen3-vl became the first open-source model to break through the 70-point threshold on the list, making a significant contribution to the open-source community with its strong visual analysis capabilities.

By comparison, some international veteran models showed weaker performance. Claude-opus-4-5 from Anthropic scored 71.44 points, while OpenAI's GPT-5.2 (high) unexpectedly fell out of the first tier, ranking lower with only 69.16 points. This change in rankings marks that the competition in the multimodal AI field is entering a more intense phase.

Key Points:

  • 🏆 Global Leader: Google's Gemini-3-pro won with 83.64 points, ranking first in all three indicators: basic cognition, visual reasoning, and application.

  • 🇨🇳 Domestic Breakthrough: SenseTime's SenseNova and ByteDance's Douyin ranked second and third, showcasing China's strong competitiveness in visual understanding.

  • 📊 Industry Reorganization: Qwen3-vl became the first open-source model to break 70 points, while GPT-5.2 (high) underperformed expectations in this visual evaluation, ranking at the back.