FaceWall Intelligence, in collaboration with the NLP Lab of Tsinghua University, officially launched its latest edge-side multimodal large model MiniCPM-V4.5, marking a new height in edge AI technology.

image.png

This latest masterpiece of the MiniCPM series sets new standards for edge-side multimodal models with its outstanding performance, efficient deployment capabilities, and wide application scenarios. Below, AIbase provides a detailed analysis of this breakthrough technology.

image.png

Technical Breakthrough: Smaller Parameters, Stronger Performance

MiniCPM-V4.5 is built on the SigLIP2-400M visual module and the MiniCPM4-3B language model, with a total parameter count of only 410 million. It has shown impressive performance in multiple benchmark tests. According to official data, MiniCPM-V4.5 achieved an average score of 69.0 in the OpenCompass comprehensive evaluation, surpassing GPT-4.1-mini (version 20250414, 64.5 points) and Qwen2.5-VL-3B-Instruct (64.5 points), becoming a performance benchmark for edge-side multimodal models. Compared to its predecessor MiniCPM-V2.6 (810 million parameters, 65.2 points), the new model significantly improves performance while drastically reducing parameters, fully demonstrating FaceWall Intelligence's deep technical expertise in model compression and optimization.

Enhanced Multimodal Capabilities: Full Support for Vision, Text, and Video

MiniCPM-V4.5 supports single-image, multi-image, and video understanding, and excels in high-resolution image processing, OCR (optical character recognition), and multilingual support.

  • Visual Capabilities: The model can process images up to 1.8 million pixels (1344x1344), supports any aspect ratio, and its OCR performance exceeds mainstream proprietary models such as GPT-4o and Gemini 1.5 Pro on the OCRBench benchmark.
  • Multi-Image and Video Understanding: In benchmarks such as Mantis-Eval, BLINK, and Video-MME, MiniCPM-V4.5 demonstrates leading capabilities in multi-image reasoning and video spatiotemporal information processing, suitable for content analysis in complex scenarios.
  • Multilingual Support: Building on the multilingual strengths of the MiniCPM series, the model supports over 30 languages, including English, Chinese, German, French, Italian, Korean, providing seamless multimodal interaction experiences for users worldwide.

Efficient Deployment: Optimized for Edge Devices

MiniCPM-V4.5 is a model of efficiency. Thanks to its high token density (processing 1.8 million pixel images requires only 640 visual tokens, a 75% reduction compared to most models), the model has significant optimizations in inference speed, first-token latency, memory usage, and power consumption. Testing shows that on the iPhone 16 Pro Max, MiniCPM-V4.5 achieves a first-token latency of less than 2 seconds, a decoding speed of over 17 tokens/s, and no noticeable overheating issues. This makes it easy to deploy on smartphones, tablets, and other edge devices, meeting the needs of mobile, offline, and privacy protection scenarios.

In addition, MiniCPM-V4.5 supports various deployment methods, including llama.cpp, Ollama, vLLM, and SGLang, and provides iOS app support, greatly lowering the entry barrier for developers.

Open Ecosystem: Promoting Academic and Business Innovation

FaceWall Intelligence continues its tradition of open-source code. MiniCPM-V4.5 is released under the Apache 2.0 license, completely open-sourced for academic researchers, and commercial users can use it free of charge after simple registration. This initiative further reduces the barriers to multimodal AI, promoting the dual development of academic research and business applications. To date, the MiniCPM series has accumulated over one million downloads on GitHub and HuggingFace, becoming a benchmark model in the field of edge AI.

The release of MiniCPM-V4.5 not only demonstrates FaceWall Intelligence's leading position in the field of multimodal large models but also points the way for the popularization of edge AI. From real-time video analysis to smart document processing, and from multilingual interaction, the wide applicability of MiniCPM-V4.5 brings new possibilities to industries such as education, healthcare, and content creation.

AIbase believes that with the rapid improvement of edge-side computing power and continuous optimization of model efficiency, MiniCPM-V4.5 is expected to become the "new normal" on edge devices, comparable to cloud-based AI.

Project: https://huggingface.co/openbmb/MiniCPM-V-4_5