Xiaomi Opensources Latest Multimodal Large Model Xiaomi MiMo-VL-7B-2508

AIbase基地

Published inAI News · 2 min read · Aug 9, 2025

Xiaomi's large model team announced the open source of the latest multimodal large model Xiaomi MiMo-VL-7B-2508, which includes RL and SFT versions.

Official data shows that the new model has set new records in four core capabilities: subject reasoning, document understanding, graphical interface positioning, and video understanding. Among them, the MMMU benchmark has broken through 70 for the first time, ChartQA has reached 94.4, ScreenSpot-v2 has reached 92.5, and VideoMME has improved to 70.8.

WeChat screenshot_20250809102003.png

This iteration improved the stability of reinforcement learning and the supervised fine-tuning process, allowing the model's internal VLM Arena score to jump from 1093.9 to 1131.2.

Notably, users can switch between "thinking" and "non-thinking" modes by using the "/no_think" instruction when asking questions: the former displays the entire reasoning chain and achieves a 100% control success rate, while the latter generates answers directly, with faster response times and a 99.84% success rate.

MiMo-VL-7B-RL-2508

Recommended for users to experience under most circumstances.
Open source address: https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-RL-2508

MiMo-VL-7B-SFT-2508

Users can perform SFT and RL based on this model according to their actual needs. Compared with the previous version of the SFT model, this model has higher RL stability.
Open source address: https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-SFT-2508

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Xiaomi Opensources Latest Multimodal Large Model Xiaomi MiMo-VL-7B-2508

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Alibaba Qwen-VL-30B-A3B New Model Released, Stronger Performance in Mathematics and Video Processing

Computing Bottlenecks and Privacy Dilemmas: OpenAI's New AI Device May Be Delayed

How Developers Can Use Apple's Local AI Models in iOS 26

Kuaishou Coline 2.5 Turbo Model Successfully Tops the Global Video Generation Ranking!

Meta Releases New Model CWM to Aid Code Understanding and Generation

Opera Launches AI-Powered Neon Browser to Enhance Productivity and Smart Task Management

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

DeepMind Introduces the Concept of FrameChain: Video Models May Achieve Comprehensive Visual Understanding

Robot Vision Makes a Big Leap! New Model Helps AI Understand the 3D World, Success Rate Increased by 31%

OpenAI splits into two: launches TikTok-like app Sora2 and integrates instant shopping features into ChatGPT

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Xiaomi Opensources Latest Multimodal Large Model Xiaomi MiMo-VL-7B-2508

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Alibaba Qwen-VL-30B-A3B New Model Released, Stronger Performance in Mathematics and Video Processing

Computing Bottlenecks and Privacy Dilemmas: OpenAI's New AI Device May Be Delayed

How Developers Can Use Apple's Local AI Models in iOS 26

Kuaishou Coline 2.5 Turbo Model Successfully Tops the Global Video Generation Ranking!

Meta Releases New Model CWM to Aid Code Understanding and Generation

Opera Launches AI-Powered Neon Browser to Enhance Productivity and Smart Task Management

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

DeepMind Introduces the Concept of FrameChain: Video Models May Achieve Comprehensive Visual Understanding

Robot Vision Makes a Big Leap! New Model Helps AI Understand the 3D World, Success Rate Increased by 31%

OpenAI splits into two: launches TikTok-like app Sora2 and integrates instant shopping features into ChatGPT

GEO Services