Smartphones Can Also Run! Mingsheng Intelligence Launches MiniCPM-V4.5: 410 Million Parameters Outperform GPT-4.1-mini

FaceWall Intelligence, in collaboration with the NLP Lab of Tsinghua University, officially launched its latest edge-side multimodal large model MiniCPM-V4.5, marking a new height in edge AI technology.

This latest masterpiece of the MiniCPM series sets new standards for edge-side multimodal models with its outstanding performance, efficient deployment capabilities, and wide application scenarios. Below, AIbase provides a detailed analysis of this breakthrough technology.

Technical Breakthrough: Smaller Parameters, Stronger Performance

MiniCPM-V4.5 is built on the SigLIP2-400M visual module and the MiniCPM4-3B language model, with a total parameter count of only 410 million. It has shown impressive performance in multiple benchmark tests. According to official data, MiniCPM-V4.5 achieved an average score of 69.0 in the OpenCompass comprehensive evaluation, surpassing GPT-4.1-mini (version 20250414, 64.5 points) and Qwen2.5-VL-3B-Instruct (64.5 points), becoming a performance benchmark for edge-side multimodal models. Compared to its predecessor MiniCPM-V2.6 (810 million parameters, 65.2 points), the new model significantly improves performance while drastically reducing parameters, fully demonstrating FaceWall Intelligence's deep technical expertise in model compression and optimization.

Enhanced Multimodal Capabilities: Full Support for Vision, Text, and Video

MiniCPM-V4.5 supports single-image, multi-image, and video understanding, and excels in high-resolution image processing, OCR (optical character recognition), and multilingual support.

Visual Capabilities: The model can process images up to 1.8 million pixels (1344x1344), supports any aspect ratio, and its OCR performance exceeds mainstream proprietary models such as GPT-4o and Gemini 1.5 Pro on the OCRBench benchmark.
Multi-Image and Video Understanding: In benchmarks such as Mantis-Eval, BLINK, and Video-MME, MiniCPM-V4.5 demonstrates leading capabilities in multi-image reasoning and video spatiotemporal information processing, suitable for content analysis in complex scenarios.
Multilingual Support: Building on the multilingual strengths of the MiniCPM series, the model supports over 30 languages, including English, Chinese, German, French, Italian, Korean, providing seamless multimodal interaction experiences for users worldwide.

Efficient Deployment: Optimized for Edge Devices

MiniCPM-V4.5 is a model of efficiency. Thanks to its high token density (processing 1.8 million pixel images requires only 640 visual tokens, a 75% reduction compared to most models), the model has significant optimizations in inference speed, first-token latency, memory usage, and power consumption. Testing shows that on the iPhone 16 Pro Max, MiniCPM-V4.5 achieves a first-token latency of less than 2 seconds, a decoding speed of over 17 tokens/s, and no noticeable overheating issues. This makes it easy to deploy on smartphones, tablets, and other edge devices, meeting the needs of mobile, offline, and privacy protection scenarios.

In addition, MiniCPM-V4.5 supports various deployment methods, including llama.cpp, Ollama, vLLM, and SGLang, and provides iOS app support, greatly lowering the entry barrier for developers.

Open Ecosystem: Promoting Academic and Business Innovation

FaceWall Intelligence continues its tradition of open-source code. MiniCPM-V4.5 is released under the Apache 2.0 license, completely open-sourced for academic researchers, and commercial users can use it free of charge after simple registration. This initiative further reduces the barriers to multimodal AI, promoting the dual development of academic research and business applications. To date, the MiniCPM series has accumulated over one million downloads on GitHub and HuggingFace, becoming a benchmark model in the field of edge AI.

The release of MiniCPM-V4.5 not only demonstrates FaceWall Intelligence's leading position in the field of multimodal large models but also points the way for the popularization of edge AI. From real-time video analysis to smart document processing, and from multilingual interaction, the wide applicability of MiniCPM-V4.5 brings new possibilities to industries such as education, healthcare, and content creation.

AIbase believes that with the rapid improvement of edge-side computing power and continuous optimization of model efficiency, MiniCPM-V4.5 is expected to become the "new normal" on edge devices, comparable to cloud-based AI.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Smartphones Can Also Run! Mingsheng Intelligence Launches MiniCPM-V4.5: 410 Million Parameters Outperform GPT-4.1-mini

AIbase基地

Technical Breakthrough: Smaller Parameters, Stronger Performance

Enhanced Multimodal Capabilities: Full Support for Vision, Text, and Video

Efficient Deployment: Optimized for Edge Devices

Open Ecosystem: Promoting Academic and Business Innovation

This article is from AIbase Daily

AI News Recommendations

Tencent Cloud TokenHub Launches Preview Version of DeepSeek-V4 with Full Support for Million-Level Context

AI Daily: DeepSeek-V4 Preview Version Officially Released; Tesla In-Car Voice Accesses Doubao; Meituan Secretly Trials a Trillion-Level AI Large Model

Hong Kong Stock Market's Large Model Stocks Plunge! Zhipu and Minimax Suffer Heavy Losses After Deepseek V4 Release

Cambricon Announces Full Series Model Day0 Compatibility and Open-Sourcing of Optimized Code for DeepSeek-V4

Cambrian Successfully Compatible with DeepSeek-V4 Promotes Efficient AI Model Operation

DeepSeek-V4 Released! Performance Approaching Top Closed-Source Models, Million-Context, Starting at 1 Yuan

Kunlun Tech Launches 4+3 Strategy: From Technical Foundation to Business Cycle

DeepSeek-V4 Preview Version Officially Released: 1M Long Context Enters the Era of Universal Access

DeepSeek V4 Officially Released: DeepSeek-V4-Flash and DeepSeek-V4-Pro Dual Versions Pricing Revealed

Xiaomi Launches Full-Chain Speech Large Model MiMo-V2.5 TTS Can Generate New Voice Models with a Single Sentence ASR Open Source Supports Dialects and Multilingual Mixtures