Open Source of Xiaomi Multi-modal Large Model Xiaomi MiMo-VL

AIbase基地

Published inAI News · 3 min read · May 30, 2025

155

Recently, the MiMo-VL multimodal model developed by Xiaomi Company has taken over the baton from MiMo-7B and demonstrated strong capabilities in multiple fields. The model significantly outperforms its peers in tasks such as general question answering and understanding inference for images, videos, and language. It even rivals specialized models in the GUI Grounding task, preparing for the advent of the Agent era.

The MiMo-VL-7B model has achieved remarkable results in multimodal reasoning tasks. Despite having only 7 billion parameters, it surpasses Alibaba's Qwen-2.5-VL-72B and QVQ-72B-Preview (which have 10 times more parameters) in the Olympic Bench (OlympiadBench) and several math competitions (MathVision, MathVerse). It also outperforms the closed-source model GPT-4o. In internal large model arena evaluations of real user experience, MiMo-VL-7B surpassed GPT-4o, becoming a standout among open-source models. In practical applications, the model excels in complex image reasoning and question answering and demonstrates great potential in GUI operations spanning over ten steps, even helping users add Xiaomi SU7 to their wishlists.

MiMo-VL-7B’s comprehensive visual perception capabilities are due to high-quality pre-training data and innovative hybrid online reinforcement learning algorithms (MORL). During the multi-stage pre-training process, Xiaomi collected, cleaned, and synthesized high-quality multimodal pre-training data, totaling 2.4 trillion tokens, covering types such as image-text pairs, video-text pairs, and GUI operation sequences. By adjusting the proportions of different data types in stages, the model's long-range multimodal reasoning capabilities were strengthened. Hybrid online reinforcement learning combines feedback signals such as text reasoning, multimodal perception + reasoning, and RLHF, and through online reinforcement learning algorithms, it stabilizes and accelerates training, comprehensively improving the model’s reasoning, perception performance, and user experience.

Related Links: https://huggingface.co/XiaomiMiMo.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

Open Source of Xiaomi Multi-modal Large Model Xiaomi MiMo-VL

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Warner Bros. Launches Counterattack: Sues AI Image Generation Company Midjourney

Uber India Launches New Driver Data Classification Task to Support AI Model Development

AI Company Flock Safety Aims to Eliminate Crime in the US with Smart Cameras

Microsoft officially launches GPT-realtime model, focusing on more realistic voice and multimodal input

Google's Veo 3 Video Generation Model Launches on Google Photos, Turning Static Photos into Dynamic Videos

Atlassian Acquires Browser Company for 610 Million Dollars to Create an AI Work Browser

AI Customer Service Company Sierra, Founded by Former Salesforce Co-CEO, Reaches a Valuation of $1 Billion

AI company Sierra, founded by former co-CEO of Salesforce, raises $350 million in funding with a valuation of $10 billion

Report: DeepSeek to Launch a Powerful AI Agent Model by the End of the Year

WisdomAI Launches Proactive Agents, Making AI a 24/7 Data Analyst

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

Open Source of Xiaomi Multi-modal Large Model Xiaomi MiMo-VL

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Warner Bros. Launches Counterattack: Sues AI Image Generation Company Midjourney

Uber India Launches New Driver Data Classification Task to Support AI Model Development

AI Company Flock Safety Aims to Eliminate Crime in the US with Smart Cameras

Microsoft officially launches GPT-realtime model, focusing on more realistic voice and multimodal input

Google's Veo 3 Video Generation Model Launches on Google Photos, Turning Static Photos into Dynamic Videos

Atlassian Acquires Browser Company for 610 Million Dollars to Create an AI Work Browser

AI Customer Service Company Sierra, Founded by Former Salesforce Co-CEO, Reaches a Valuation of $1 Billion

AI company Sierra, founded by former co-CEO of Salesforce, raises $350 million in funding with a valuation of $10 billion

Report: DeepSeek to Launch a Powerful AI Agent Model by the End of the Year

WisdomAI Launches Proactive Agents, Making AI a 24/7 Data Analyst

GEO Services