Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

MiniCPM-V4.0 Open Source Release, Considered GPT-4V on Mobile Devices

AIbase基地

Published inAI News · 5 min read · Aug 7, 2025

The OpenBMB team recently announced the official open-source release of the new multimodal large model MiniCPM-V4.0. With its lightweight architecture and excellent performance, it is called "GPT-4V on a phone" and is expected to bring revolutionary breakthroughs in AI applications on mobile devices.

The core of MiniCPM-V4.0 lies in its ingenious design. It is built upon SigLIP2-400M and MiniCPM4-3B, with only 4.1B parameters, yet it demonstrates powerful capabilities in image, multi-image, and video understanding. This enables it not only to easily process single images but also to understand complex multi-image relationships and video clips, providing users with a smarter interaction experience.

Despite its small parameter count, MiniCPM-V4.0's performance is astonishing. On eight mainstream evaluation benchmarks of OpenCompass, the model achieved an average score of 69.0, surpassing many competitors such as GPT-4.1-mini and Qwen2.5-VL-3B. This achievement proves its strong strength in visual understanding, especially in handling complex scenarios, where its accuracy and depth of analysis are impressive.

Another major highlight of MiniCPM-V4.0 is its high-level optimization for mobile devices. On the latest iPhone16Pro Max, real-world testing showed that the first response delay was less than 2 seconds, decoding speed exceeded 17token/second, and it could effectively control device heating during operation, ensuring a smooth and stable user experience. In addition, it can handle high-concurrency requests, making it suitable for practical applications on mobile phones, tablets, and other edge devices.

To lower the usage barrier for developers, the OpenBMB team provided rich ecosystem support. MiniCPM-V4.0 is compatible with mainstream frameworks such as llama.cpp, Ollama, and vllm_project, offering developers flexible deployment options. The team also developed a dedicated iOS app, supporting direct operation on iPhone and iPad, and released a detailed Cookbook, providing complete tutorials and code examples.

The release of MiniCPM-V4.0 has opened up new possibilities for the application of multimodal technology. Its main application scenarios are extensive, including:

Image Analysis and Multi-turn Dialogue: Users can upload images, allowing the model to analyze their content and continue the conversation based on that.
Video Understanding: It can analyze video content and provide solutions for scenarios requiring processing of video information.
OCR and Mathematical Reasoning: The model has the ability to recognize text in images and solve mathematical problems, greatly enhancing its practicality in real work and study.

The open-source release of MiniCPM-V4.0 not only demonstrates the outstanding capabilities of domestic AI teams in lightweight model development but also provides global developers with a powerful tool to explore mobile multimodal technology, taking a solid step toward the popularization of AI.

MiniCPM-V4.0 OpenBMB MultimodalLargeModel AIApplication

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Silicon-based Flow Launches DeepSeek-V3.2-Exp, Significantly Reduces Price and Supports Larger Context

DeepSeek-V3.2-Exp, an experimental model with 160K context length and 50%+ price reduction, optimizes V3.1-Terminus via DeepSeek sparse attention for enhanced long-text efficiency.....

Oct 11, 2025

100

vivo Blue Heart Large Model Upgraded! Smart Assistant XIAO V Becomes a Thinking Expert, Free Interaction Without Wake-up Words!

At the 2025 vivo Developer Conference, the Blue Heart Language Large Model was upgraded, and the smart assistant XIAO V achieved significant progress. Core improvements include restructuring the intent control center, enhancing the accuracy of user intent understanding, enabling the decomposition of complex tasks and optimization of execution steps. A new deep thinking ability has been added, allowing XIAO V to provide more insightful and high-quality intelligent Q&A services.

Oct 10, 2025

160

Vercel v0 Tool Launches New Image Editing Features: Easy Resurrection and Prompt Optimization in Design Mode

The AI platform v0 under Vercel launches a major update in design mode, supporting direct editing of image elements. After activating design mode via shortcut keys or clicking on tags, users can directly adjust images on the interface without relying on external tools, simplifying the iteration process for designers and developers, and achieving a seamless built-in editing experience.

Oct 9, 2025

210

AI Video Revolution! Grok Imagine v0.9 Turns You Into a Movie Director in One Click - Upload Images and Instantly Get Full-Feature Movies with Singing and Dancing

xAI launches Grok Imagine v0.9, a video generation model that converts images to videos with background music, dialogue, and singing, featuring native audio-visual sync. A major upgrade from v0.1, it simplifies professional video production.....

Oct 9, 2025

660

xAI Launches Video Generation Model Imagine v0.9, Entering the Era of Movie-Level One-Click Generation

xAI's Imagine v0.9 video model achieves breakthroughs in multimodal AI creation, featuring cinematic visuals, smooth motion, and native audio generation.....

Oct 9, 2025

250

Meta Plans to Acquire RISC-V Company Rivos, Focusing on AI Chip Technology Upgrades

Meta plans to acquire RISC-V chip firm Rivos to accelerate scalable computing. VP Song Yijun highlights Rivos' expertise in full-stack AI system design, aiming to advance AI development. Rivos, backed by Intel's CEO, is based in Santa Clara.....

Oct 2, 2025

220

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

DeepSeek releases V3.2-exp, featuring sparse attention to cut long-context costs by 50% and reduce API expenses for developers.....

Sep 30, 2025

370

Seko User Count Exceeds 100,000: AI Creation Enters a New One-Stop Stage

Seko, SenseTime's AI video tool, surpassed 100K users in a month. It integrates generative models like 'Rixin' and simplifies video creation via dialogue, boosting efficiency.....

Sep 30, 2025

160

Cambrian announces full compatibility of the DeepSeek-V3.2-Exp model, the inference engine is open-sourced!

Cambricon successfully adapted the DeepSeek-V3.2-Exp model and open-sourced the vLLM-MLU inference engine, advancing AI tech. This innovation enhances efficiency and marks progress in its software ecosystem, offering developers new tools.....

Sep 30, 2025

170

DeepSeek releases V3.2-exp model, pioneering sparse attention mechanism significantly reduces AI inference costs

DeepSeek releases the experimental model V3.2-exp, which adopts an innovative 'sparse attention' mechanism to significantly reduce the cost of long context inference. The model is now available on Hugging Face and GitHub. The core is the 'lightning indexer' and optimized attention mechanisms to improve processing efficiency. This breakthrough technology is expected to promote the development of AI in the field of long text processing.

Sep 30, 2025

400

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

MiniCPM-V4.0 Open Source Release, Considered GPT-4V on Mobile Devices

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Silicon-based Flow Launches DeepSeek-V3.2-Exp, Significantly Reduces Price and Supports Larger Context

vivo Blue Heart Large Model Upgraded! Smart Assistant XIAO V Becomes a Thinking Expert, Free Interaction Without Wake-up Words!

Vercel v0 Tool Launches New Image Editing Features: Easy Resurrection and Prompt Optimization in Design Mode

AI Video Revolution! Grok Imagine v0.9 Turns You Into a Movie Director in One Click - Upload Images and Instantly Get Full-Feature Movies with Singing and Dancing

xAI Launches Video Generation Model Imagine v0.9, Entering the Era of Movie-Level One-Click Generation

Meta Plans to Acquire RISC-V Company Rivos, Focusing on AI Chip Technology Upgrades

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

Seko User Count Exceeds 100,000: AI Creation Enters a New One-Stop Stage

Cambrian announces full compatibility of the DeepSeek-V3.2-Exp model, the inference engine is open-sourced!

DeepSeek releases V3.2-exp model, pioneering sparse attention mechanism significantly reduces AI inference costs

GEO Services