AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

Evaluation of Multi-modal Large Model Visual Reasoning Capability: o3 Scores Only 25.8%

Recently, a new evaluation benchmark - RBench-V, specifically designed to test the visual reasoning capabilities of multi-modal large models, was released by research teams from Tsinghua University, Tencent HUNYUAN, Stanford University, and Carnegie Mellon University. The introduction of this benchmark aims to fill the gap in the current evaluation system regarding the model's visual output capabilities, allowing for a more comprehensive understanding of existing model performance. The RBench-V benchmark consists of 803 questions covering multiple fields, including geometry and graph theory, mechanics and electromagnetism, multi-target recognition, and path planning.

5.8k 5 days ago
Evaluation of Multi-modal Large Model Visual Reasoning Capability: o3 Scores Only 25.8%
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map