Best RBench-V AI Tools & Models - Premium RBench-V News

AI News

Evaluation of Multi-modal Large Model Visual Reasoning Capability: o3 Scores Only 25.8%

Recently, a new evaluation benchmark - RBench-V, specifically designed to test the visual reasoning capabilities of multi-modal large models, was released by research teams from Tsinghua University, Tencent HUNYUAN, Stanford University, and Carnegie Mellon University. The introduction of this benchmark aims to fill the gap in the current evaluation system regarding the model's visual output capabilities, allowing for a more comprehensive understanding of existing model performance. The RBench-V benchmark consists of 803 questions covering multiple fields, including geometry and graph theory, mechanics and electromagnetism, multi-target recognition, and path planning.

10.7k yesterday

Evaluation of Multi-modal Large Model Visual Reasoning Capability: o3 Scores Only 25.8%

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AI Marketing LLM Leaderboard AI Ranking

Business Cooperation Site Map