Open-source AI community released MiniCPM-V4.5, an 8B-parameter multimodal LLM optimized for mobile devices, scoring 77.2 in OpenCompass, leading open-source models for mobile AI.....
MiniCPM-V4.5, a multimodal model by FaceWall and Tsinghua, combines SigLIP2-400M vision with MiniCPM4 for enhanced edge AI efficiency and applications.....
MiniCPM-V4.0, a 410M-parameter AI model, excels in vision with 69.0 OpenCompass score. Optimized for mobile, it runs smoothly on iPhone16Pro Max. Offers iOS app & multi-platform tools.....
AI updates: Alibaba's Qwen3-4B for mobile nears 30B model performance; Xiaohongshu opens dots.vlm1 with NaViT encoder; MiniMax launches Speech2.5; Midjourney adds HD video; Cursor1.4 enhances coding; Google AI increases zero-click searches; MiniCPM-V4.0 matches GPT-4V on mobile; AMD/Qualcomm support gpt-oss edge computing; Tencent opens WeKnora; GPT-5 leaks suspected; FlowSpeech debuts text-to-speech conversion.....
High-performance multimodal language model suitable for image and video understanding.
openbmb
AgentCPM-GUI is an on-device graphical interface agent with RFT-enhanced reasoning capabilities, capable of operating Chinese and English applications, built upon the 8-billion-parameter MiniCPM-V.
FriendliAI
MiniCPM-V 2.6 is a powerful multimodal large language model that can run efficiently on devices such as mobile phones and supports single-image, multi-image, and video understanding tasks.
c01zaut
MiniCPM-V 2.6 is a GPT-4V-level multimodal large language model supporting single-image, multi-image, and video understanding, optimized for RK3588 NPU
AI-Engine
The GGUF quantized version of MiniCPM-V-2_6, which realizes efficient image-text conversion based on llama.cpp
jchevallard
MiniCPM-V 2.6 is the latest and most powerful multimodal large model in the MiniCPM-V series, supporting single-image, multi-image, and video understanding with leading performance and extreme efficiency.
gaianet
MiniCPM-V-2_6 is a visual question answering model supporting both Chinese and English, specializing in vision-related QA tasks.
MiniCPM-V is a mobile GPT-4V-level multimodal large language model that supports single-image, multi-image, and video understanding, equipped with visual and optical character recognition capabilities.
MiniCPM-V 2.6 is a multimodal vision-language model supporting image-to-text conversion with multilingual processing capabilities.
RhapsodyAI
An OCR-free visual document embedding model capable of understanding document content through images and generating representation vectors, suitable for text and visually intensive document retrieval.
MiniCPM-V 2.6 is a multimodal large model launched by OpenBMB, surpassing GPT-4V in single-image, multi-image, and video understanding tasks, and supports real-time video understanding on iPad.
MiniCPM-V 2.0 is a powerful multimodal large language model designed for efficient terminal deployment, built upon SigLip-400M and MiniCPM-2.4B and connected via a perceptual resampler.
MiniCPM-V is an efficient lightweight multimodal model optimized for edge device deployment, supporting bilingual (Chinese-English) interaction and outperforming models of similar scale.