Latest Evaluation of Multimodal Large Models Released! Gemini-3-Pro Ranks First with a Major Gap, Doubao and SenseTime Lead the Domestic Group, Qwen3-VL Becomes the First Open-Source Model to Achieve High Scores

AIbase基地

Published inAI News · 5 min read · Dec 31, 2025

The competitive landscape of global multimodal large models has been updated again. Recently, the authoritative evaluation platform SuperCLUE-VLM released the comprehensive list of multimodal vision-language models for December 2025. Google's Gemini-3-Pro led with an impressive score of 83.64 points, showcasing its overwhelming advantage in visual understanding and reasoning. ByteDance's Douyin large model secured a strong third place with 73.15 points, while SenseTime's SenseNova V6.5Pro ranked second with 75.35 points. Overall, domestic large models performed outstandingly, demonstrating China's rapid progress in the multimodal field.

Evaluation Dimensions: Three Capabilities Fully Measure a Model's "Vision"

SuperCLUE-VLM evaluates a model's real visual understanding ability from three core dimensions:

- Basic Cognition: Identify basic elements such as objects, text, and scenes in images;

- Visual Reasoning: Understand the logic, causal relationships, and implicit information in images;

- Visual Application: Complete tasks such as image-text generation, cross-modal Q&A, and tool invocation.

Gemini-3-Pro Dominates, Domestic Models Catch Up

Google's Gemini-3-Pro leads in all three indicators:

- Basic Cognition: 89.01 points

- Visual Reasoning: 82.82 points

- Visual Application: 79.09 points

Its overall performance far surpasses other competitors, reinforcing Google's dominant position in the multimodal field.

Domestic models also showed strong performance:

- SenseTime's SenseNova V6.5Pro ranks second with 75.35 points, showing balanced reasoning and application capabilities;

- ByteDance's Douyin large model ranks third with 73.15 points, achieving an impressive 82.70 points in basic cognition, even surpassing some international models, but slightly lacking in visual reasoning;

- Baidu's ERNIE-5.0-Preview and Alibaba's Qwen3-VL follow closely, both entering the top five.

Notably, Qwen3-VL became the first open-source multimodal model on the list to exceed 70 points in total, providing global developers with a high-performance, commercializable open foundation.

International Giants Show Divergence: Claude Performs Steadily, GPT-5.2 Falls Behind Unexpectedly

In the international group, Anthropic's Claude-opus-4-5 scored 71.44 points, ranking in the mid-top, continuing its advantage in language understanding. However, OpenAI's GPT-5.2 (high configuration) only scored 69.16 points, placing it lower on the list, sparking discussions about its optimization direction in multimodal capabilities.

AIbase Observation: The Multimodal Competition Enters a New "Practical" Stage

The SuperCLUE-VLM list is not just a technical ranking but also reflects industry trends:

- Rise of Open Source Models: Qwen3-VL proves that the open source approach can also achieve high performance, promoting the democratization of technology;

- Focus on Scenario Implementation by Domestic Models: Models like Douyin and SenseTime perform well in basic cognition, aligning with frequent needs such as Chinese internet image-text understanding and short video analysis;

- Visual Reasoning Remains a Bottleneck: Most models still have gaps in advanced tasks like complex logic and causal inference, which is a key factor behind Gemini's continued leadership.

The AI Manga Drama Scalability Tool Has Arrived! NetSpeed Technologies Launches Edge AI Gateway: Hundreds of Models Plug-and-Play, Guangtongchen and Ouxi Network Have Already Reduced Their Burden

The mass production of AI manga dramas faces three major challenges: difficult multi-model collaboration, involving the integration of dozens of models with high interface maintenance costs; large cloud processing latency affecting generation efficiency; and significant cost control pressure restricting industry development.

Apple Releases New M5 Series Chip: AI Performance Significantly Improved, MacBook Pro Battery Life Exceeds 24 Hours!

Apple's spring conference launched the M5Pro and M5Max chips, which are integrated into the new MacBook Pro and MacBook Air. The M5Max uses a 3-nanometer fusion architecture, with GPU cores integrating an AI accelerator, significantly enhancing AI performance and redefining the standard for AI PC performance.

Microsoft Launches the Small Multimodal AI Model Phi-4: The Perfect Combination of Thinking and Perception!

Microsoft releases the open-source AI model Phi-4-Reasoning-Vision-15B, which has high-resolution visual perception and deep reasoning capabilities. It is the first small language model that achieves both 'clear vision' and 'deep thinking,' opening up new intelligent application scenarios for developers.

Capable of Deciding When to Think on Its Own! Microsoft Releases Phi-4 15B Open-Source Model, Focused on Miniaturization and Multimodal Capabilities

Microsoft releases the open-source multimodal large model Phi-4-reasoning-vision-15B, which has 15 billion parameters. Its core breakthrough is the ability to autonomously assess task difficulty and intelligently choose between rapid response or in-depth reasoning, a rare feature in lightweight open-source models. The model specializes in high-difficulty tasks such as image description, interface element localization, and complex mathematical reasoning.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Latest Evaluation of Multimodal Large Models Released! Gemini-3-Pro Ranks First with a Major Gap, Doubao and SenseTime Lead the Domestic Group, Qwen3-VL Becomes the First Open-Source Model to Achieve High Scores

AIbase基地

This article is from AIbase Daily

AI News Recommendations

The AI Manga Drama Scalability Tool Has Arrived! NetSpeed Technologies Launches Edge AI Gateway: Hundreds of Models Plug-and-Play, Guangtongchen and Ouxi Network Have Already Reduced Their Burden

Intensified Competition for Model Talent: DeepMind and Zhipu Actively Recruiting After Qwen Team's Personnel Changes

Google NotebookLM Launches New Cinematic Video Overview Feature

Google Chrome Exposed for Forcing 4GB AI Model Installation

Apple Releases New M5 Series Chip: AI Performance Significantly Improved, MacBook Pro Battery Life Exceeds 24 Hours!

Google Search Revolution! Canvas Launches Full US Beta: One-Click Transformation of Search Results into App, Million Token Window Directly Competing with ChatGPT

Video Creation Becomes Cinematic! Google Upgrades NotebookLM: Movie-Level Visual Overview Feature Now Live

Deadly AI Wife? Florida Man Suicide After Getting Lost in Gemini Virtual World, Family Sues Google: Accuses AI of Guiding Mass Attacks and Murder Missions

Microsoft Launches the Small Multimodal AI Model Phi-4: The Perfect Combination of Thinking and Perception!

Capable of Deciding When to Think on Its Own! Microsoft Releases Phi-4 15B Open-Source Model, Focused on Miniaturization and Multimodal Capabilities

AI News Recommendations

The AI Manga Drama Scalability Tool Has Arrived! NetSpeed Technologies Launches Edge AI Gateway: Hundreds of Models Plug-and-Play, Guangtongchen and Ouxi Network Have Already Reduced Their Burden

Intensified Competition for Model Talent: DeepMind and Zhipu Actively Recruiting After Qwen Team's Personnel Changes

Google NotebookLM Launches New Cinematic Video Overview Feature

Google Chrome Exposed for Forcing 4GB AI Model Installation

Apple Releases New M5 Series Chip: AI Performance Significantly Improved, MacBook Pro Battery Life Exceeds 24 Hours!

Google Search Revolution! Canvas Launches Full US Beta: One-Click Transformation of Search Results into App, Million Token Window Directly Competing with ChatGPT

Video Creation Becomes Cinematic! Google Upgrades NotebookLM: Movie-Level Visual Overview Feature Now Live

Deadly AI Wife? Florida Man Suicide After Getting Lost in Gemini Virtual World, Family Sues Google: Accuses AI of Guiding Mass Attacks and Murder Missions

Microsoft Launches the Small Multimodal AI Model Phi-4: The Perfect Combination of Thinking and Perception!

Capable of Deciding When to Think on Its Own! Microsoft Releases Phi-4 15B Open-Source Model, Focused on Miniaturization and Multimodal Capabilities

GEO Services