Open-Source Multimodal Model MiniCPM-V 4.5 Released - 8 Billion Parameters Enable AI Deployment on Mobile Devices

AIbase基地

Published inAI News · 6 min read · Sep 1, 2025

Recently, the open-source AI community has made significant progress with the official release of MiniCPM-V4.5, a multimodal large language model designed for edge devices. This model, with an 800 million parameter scale, achieves efficient operation on smartphones and tablets, opening up new possibilities for mobile AI applications.

Technical Features and Performance

MiniCPM-V4.5 adopts a lightweight design approach, optimized specifically for edge devices. According to test data released by the development team, the model scored 77.2 points in the OpenCompass comprehensive evaluation, showing outstanding performance among similar open-source models. The model supports various tasks such as single-image understanding, multi-image reasoning, and video analysis.

In terms of deployment on mobile devices, the first token delay on the iPhone16Pro Max is approximately 2 seconds, with a decoding speed exceeding 17 tokens per second. The model uses the 3D-Resampler technology to increase the video data compression rate to 96%, capable of processing 6 frames of video content with 64 tokens, achieving real-time video understanding at a maximum of 10FPS.

Optical character recognition is one of the key optimization directions for this model. Based on the LLaVA-UHD architecture, the model supports high-resolution image processing up to 1.8 million pixels, achieving an accuracy rate of 85.7% in the OCRBench test. Additionally, the model supports more than 30 languages, including English, Chinese, German, and French.

Innovative Mechanisms and Technical Architecture

MiniCPM-V4.5 introduces a controllable mixed thinking mechanism, allowing users to switch between fast response mode and deep reasoning mode through parameter settings. Fast mode is suitable for regular question-answering tasks, while deep mode processes complex problems through step-by-step reasoning.

The model is trained using RLAIF-V and VisCPM technologies, which have improved the reduction of hallucination phenomena. The development team stated that this training method enhances the accuracy and reliability of the model's responses.

Open Source Ecosystem and Deployment Support

MiniCPM-V4.5 is released under the Apache-2.0 license, allowing free use for academic research, while commercial applications require a simple registration process. The model is compatible with multiple inference frameworks, including llama.cpp, Ollama, vLLM, and SGLang, and provides 16 quantization formats to adapt to different hardware configurations.

The development team also released an iOS application, making it convenient for users to experience the model on Apple devices. Developers can obtain the model code and documentation through Hugging Face and GitHub, supporting the setup of a local Web interface via Gradio, or performing inference acceleration on NVIDIA GPUs.

Application Prospects and Limitations

As a multimodal model optimized for mobile devices, MiniCPM-V4.5 has application value in privacy-sensitive and offline scenarios. Its lightweight design reduces the deployment threshold for AI capabilities, providing new options for individual users and developers.

It should be noted that due to parameter scale limitations, the model may have performance boundaries when handling extremely complex tasks. Users should choose the appropriate model solution based on their specific needs during actual applications. The development team reminds users that the generated content of the model is based on training data, and they must ensure compliance and bear corresponding responsibilities.

Industry Impact

The release of MiniCPM-V4.5 reflects the technical exploration of the open-source AI community in the direction of edge deployment. With the continuous improvement of mobile device computing power, such lightweight multimodal models may provide new technical paths for the popularization of AI applications.

The open-source nature of this project also provides a foundation for researchers and developers to learn and improve, and it is expected to promote further development of edge-side AI technology.

Project Address: https://github.com/OpenBMB/MiniCPM-V

AI Daily: Tencent Huan Yuan 2.0 Enters Internal Testing; Alibaba Qwen3-TTS Launches; Keling AI Launches Character Library

Welcome to the [AI Daily] section! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with hot topics in the AI field, focus on developers, and help you understand technology trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1、Keling AI Launches Character Library: The model has memory capabilities and characters never change face. Keling AI released the 'Character Library,' adding long-term memory capabilities to the O1 multimodal video model, achieving character consistency over 96%

Microsoft Launches VibeVoice-Realtime: A New Real-Time Text-to-Speech Model for Interactive Applications

Microsoft launches VibeVoice-Realtime-0.5B, a lightweight real-time text-to-speech model supporting streaming input and long-form output for agent applications and live data narration. It starts speech output in about 300ms, works with language models for responses, and uses a framework with continuous speech tokens for next-token diffusion.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Open-Source Multimodal Model MiniCPM-V 4.5 Released - 8 Billion Parameters Enable AI Deployment on Mobile Devices

AIbase基地

Technical Features and Performance

Innovative Mechanisms and Technical Architecture

Open Source Ecosystem and Deployment Support

Application Prospects and Limitations

Industry Impact

This article is from AIbase Daily

AI News Recommendations

AI Daily: Tencent Huan Yuan 2.0 Enters Internal Testing; Alibaba Qwen3-TTS Launches; Keling AI Launches Character Library

NVIDIA's 4B Small Model Makes a Comeback! Single Task Cost Is Just 1/36 of GPT-5 Pro

China's First Agricultural Cultivation Model Launched, Aiding in Farmland Protection and Monitoring

The Arrival of the AI Era: McKinsey Predicts Up to 800 Million Jobs Globally Will Be Replaced

Starts with a Character! Alibaba Qwen3-TTS Makes Its Debut: 49 Voice Styles + 10 Languages, 9 Dialects, WER Outperforms Mainstream Commercial Models

Apple Launches STARFlow-V: A Revolutionary Video Generation Model

Apple Launches STARFlow-V Video Model, Exclusively Using Normalizing Flow to Achieve 30-Second Stable Video

Microsoft Launches VibeVoice-Realtime: A New Real-Time Text-to-Speech Model for Interactive Applications

70% of professionals in the creative industry feel social pressure due to using AI, worrying about unemployment

Google Gemini Web Version New Update! Access AI-Generated Content with One Click, Fresh Look!

GEO Services