On June 6, Memery Intelligence officially launched its latest masterpiece - the MiniCPM4.0 series model, which is hailed as "the most imaginative little powerhouse in history." This series not only achieves a leap in endpoint performance but also sets a new benchmark in technological innovation.
The MiniCPM4.0 series includes two heavyweight products: one is the 8B Lightning Sparse Edition, which has sparked an efficiency storm with its innovative sparse architecture; the other is the lightweight and agile 0.5B version, known as the "strongest little powerhouse." These two models demonstrate outstanding performance in terms of speed, efficiency, performance, and practical application.
In terms of speed, MiniCPM4.0 has achieved a 220-fold increase in extreme cases and a fivefold increase in regular scenarios. This breakthrough is due to the layer-by-layer acceleration brought by system-level sparse innovations. Through the efficient dual-frequency shifting technology, the model automatically switches between sparse and dense attention mechanisms based on text length, ensuring fast and efficient processing of long texts while significantly reducing endpoint storage requirements. Compared to the similar model Qwen3-8B, it only requires 1/4 of the cache storage space.
In terms of efficiency, MiniCPM4.0 contributes industry-first full open-source system-level context sparsity high-efficiency innovations, achieving extreme acceleration at 5% sparsity rate, and integrates self-developed innovative technologies, comprehensively optimizing from the architecture layer, system layer, reasoning layer to data layer, truly realizing system-level software and hardware sparsity high-efficiency implementation.
In terms of performance, MiniCPM4.0 continues the tradition of "small but powerful." The 0.5B version achieves half the parameter size and double the performance with only 2.7% training cost; the 8B sparse version, with 22% training cost, matches and surpasses Qwen3 and Gemma312B, consolidating its leading position in the endpoint field.
In terms of practical application, MiniCPM4.0 demonstrates strong strength. By combining the self-developed CPM.cu ultra-fast endpoint inference framework, speculative sampling innovation, model compression quantization innovation, and endpoint deployment framework innovation, it achieves a 90% reduction in model size while maximizing speed, ensuring a smooth experience from the beginning to the end of endpoint inference.
Currently, the model has successfully adapted to mainstream chips such as Intel, Qualcomm, MTK, and Huawei Ascend, and has been deployed on multiple open-source frameworks, further expanding its application potential.
Model Collection:
https://www.modelscope.cn/collections/MiniCPM-4-ec015560e8c84d
Github:
https://github.com/openbmb/minicpm