Xiaomi Unveils Another AI Star! Open-Source Multimodal Large Model MiMo-VL-7B-2508 with Significant Performance Improvement, Supports Thinking Mode Switching

AIbase基地

Published inAI News · 3 min read · Aug 12, 2025

Xiaomi has announced the open source of a new version of its multimodal large model - Xiaomi MiMo-VL-7B-2508, and simultaneously released two model versions: SFT and RL. This upgrade not only optimizes the output mode but also improves the stability of RL training, achieving significant progress in various capability evaluations. At the same time, users can flexibly switch between "thinking mode" and "non-thinking mode" to adapt to different scenario requirements.

Compared to the MiMo-VL-7B-RL released in May this year, the new version has achieved breakthroughs on multiple authoritative benchmarks:

Subject Reasoning Test MMMU: Increased from 66.7 to 70.6, the first time breaking 70 points

Document Understanding Test ChartQA: Increased from 91.7 to 94.4

GUI Localization Test ScreenSpot-v2: Increased from 90.5 to 92.5

Video Understanding Test VideoMME: Increased from 67.4 to 70.8

In terms of interaction experience, the new version introduces an autonomous control thinking mode switching function. The default "thinking mode" displays the complete reasoning process, with more comprehensive performance and a 100% control success rate; while the "non-thinking mode" skips the reasoning process, offering faster response speed and a 99.84% control success rate, suitable for tasks requiring high real-time performance.

According to Xiaomi's internal VLM Arena score, the new version of MiMo-VL-7B-RL-2508 received 1131.2 points, significantly higher than the previous generation's 1093.9 points. The evaluation results show that the model comprehensively surpasses the previous generation in most benchmark tests. Even in non-thinking mode, it can maintain excellent performance in perceptual tasks. Compared to other similar multimodal open-source models that support thinking functions, MiMo-VL-7B-RL-2508 still leads the industry.

XiaomiMiMo-VL-7B-2508 MultimodalLargeModel SFT RL

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

King of Cost-Performance: Microsoft Open Sources Phi-4-reasoning-vision-15B Focused on Lightweight Multimodal Reasoning

Microsoft open sources the multimodal reasoning model Phi-4-reasoning-vision-15B, with 15B parameters, balancing lightweight design and high performance. The model is trained using only 200B multimodal tokens, emphasizing data quality, and is suitable for complex visual tasks in resource-constrained environments.

Apr 13, 2026

Ali's Black Tech Shocks the Scene! A 0.6B Small Model is Modified into a 17B MoE with Only 5% Activated Parameters, Running Directly on CPU at 30 Token/s!

The Ali International Digital Commerce team launched the Marco-Mini-Instruct model, which has 17.3B parameters and only 0.86B activated parameters, offering high inference efficiency and smooth operation on regular CPUs. With 8-bit quantization and four DDR4 2400 memory modules, the inference speed reaches about 30 token/s, promoting the practical application of the MoE architecture.

Apr 10, 2026

310

Digital Family Members On Board! Doubao Large Model Officially Launched for Buick Zhijing E7: Intelligent Cockpit Enters the Human-like Era

SAIC-GM partners with Volcano Engine to integrate Doubao AI into Buick's E7, advancing smart cockpits from command-based to semantic understanding. The system detects over 20 emotions via tone and speech patterns, evolving from a tool to an empathetic assistant.....

Apr 8, 2026

210

32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper

Alibaba's Tongyi Lab introduces the FIPO algorithm, which overcomes traditional reinforcement learning bottlenecks in complex logical reasoning. Using the Future-KL mechanism, it accurately identifies key reasoning steps, effectively addressing model stagnation in tasks like mathematics, thereby enhancing both accuracy and efficiency.....

Apr 8, 2026

340

Microsoft Bing Team Open Sources 27B Embedding Model Harrier, Top in Multilingual Benchmark Tests

Microsoft Bing's Harrier, a new open-source word embedding model series, outperforms top proprietary models like OpenAI, Amazon, and Google Gemini in multilingual benchmarks. The 27B flagship supports over 100 languages with a 32,000-token context, aiming to transform search, retrieval, and AI agent foundations.....

Apr 8, 2026

420

Tongyi Lab Launches FIPO Algorithm, 32B Model Inference Performance Surpasses o1-mini

Alibaba's Tongyi Lab introduces FIPO, a novel algorithm with a 'Future-KL' mechanism that addresses the 'reasoning length stagnation' in pure reinforcement learning for long-text reasoning, enhancing complex logic alignment training.....

Apr 8, 2026

360

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

Meituan launches LongCat-Next, a native multimodal AI model that uses DiNA technology to unify images, audio, and text into discrete tokens, enabling deep integration of multimodal modeling for enhanced perception of the physical world.....

Apr 3, 2026

470

Voice instantly becomes family and friend's voice! Guangqi Honda P7 starts OTA update: AI large model officially arrives in the car

GAC Honda P7 receives OTA update to Zhidao Interconnect 4.2.2, integrating AI models for enhanced cabin interaction, including new 'Voice Cloning' feature and automated travel planning, marking Honda's entry into AI-driven electric vehicles in China.....

Apr 3, 2026

290

Powered by the Apache 2.0 License! Google Gemma 4 is Now Open Source: 31B Parameters Performance Approaches Leading Large Models

Google DeepMind releases Gemma4, a new open-source model with generational performance leap, now under Apache 2.0 license for commercial use. Four variants cater to mobile to workstation needs.....

Apr 3, 2026

2.2k

Google Officially Launches Gemma4 Open-Source Large Model: Available in Four Specifications, 31B Version Ranks Third in Global Open-Source List

Google's open-source model Gemma4 enhances 'parameter efficiency', setting new standards for AI workflows. It includes 2.3B/4.5B efficient and 26B/31B high-performance versions, based on Gemini3, all supporting multimodal input, with some enabling real-time voice understanding.....

Apr 3, 2026

450

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Xiaomi Unveils Another AI Star! Open-Source Multimodal Large Model MiMo-VL-7B-2508 with Significant Performance Improvement, Supports Thinking Mode Switching

AIbase基地

This article is from AIbase Daily

AI News Recommendations

King of Cost-Performance: Microsoft Open Sources Phi-4-reasoning-vision-15B Focused on Lightweight Multimodal Reasoning

Ali's Black Tech Shocks the Scene! A 0.6B Small Model is Modified into a 17B MoE with Only 5% Activated Parameters, Running Directly on CPU at 30 Token/s!

Digital Family Members On Board! Doubao Large Model Officially Launched for Buick Zhijing E7: Intelligent Cockpit Enters the Human-like Era

32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper

Microsoft Bing Team Open Sources 27B Embedding Model Harrier, Top in Multilingual Benchmark Tests

Tongyi Lab Launches FIPO Algorithm, 32B Model Inference Performance Surpasses o1-mini

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

Voice instantly becomes family and friend's voice! Guangqi Honda P7 starts OTA update: AI large model officially arrives in the car

Powered by the Apache 2.0 License! Google Gemma 4 is Now Open Source: 31B Parameters Performance Approaches Leading Large Models

Google Officially Launches Gemma4 Open-Source Large Model: Available in Four Specifications, 31B Version Ranks Third in Global Open-Source List

AI News Recommendations

King of Cost-Performance: Microsoft Open Sources Phi-4-reasoning-vision-15B Focused on Lightweight Multimodal Reasoning

Ali's Black Tech Shocks the Scene! A 0.6B Small Model is Modified into a 17B MoE with Only 5% Activated Parameters, Running Directly on CPU at 30 Token/s!

Digital Family Members On Board! Doubao Large Model Officially Launched for Buick Zhijing E7: Intelligent Cockpit Enters the Human-like Era

32B Inference Performance Surpasses o1-mini! Alibaba Tongyi Launches FIPO Algorithm to Make Large Models Think Deeper

Microsoft Bing Team Open Sources 27B Embedding Model Harrier, Top in Multilingual Benchmark Tests

Tongyi Lab Launches FIPO Algorithm, 32B Model Inference Performance Surpasses o1-mini

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

Voice instantly becomes family and friend's voice! Guangqi Honda P7 starts OTA update: AI large model officially arrives in the car

Powered by the Apache 2.0 License! Google Gemma 4 is Now Open Source: 31B Parameters Performance Approaches Leading Large Models

Google Officially Launches Gemma4 Open-Source Large Model: Available in Four Specifications, 31B Version Ranks Third in Global Open-Source List

GEO Services