The multi-modal large model Ming-lite-omni from the Bai Ling team of Ant Group recently announced a significant decision at the recent Ant Technology Day: to fully open-source the model. This move not only marks another major openness initiative by Ant Group in the AI field but is also considered by the industry as the first open-source model that can rival GPT-4o in terms of modality support.

QQ20250529-151554.png

22 billion parameters technical breakthrough

Ming-lite-omni is based on Ling-lite and adopts an advanced MoE (Expert Mix) architecture, with a total of 22 billion parameters and 3 billion active parameters. This parameter scale has reached new heights among open-source multi-modal models, showcasing Ant Group's deep accumulation in large model technology.

Currently, the model weights and inference code for Ming-lite-omni have been fully opened to the public, and the training code and training data will be released in subsequent stages, providing comprehensive technical support for global developers.

Ongoing open-source strategy shows results

This year, the Bai Ling large model team has continuously open-sourced several important model products, including Ling-lite, Ling-plus, and other large language models, the multi-modal large model Ming-lite-uni, as well as the preview version Ming-lite-omni.

In mid-May, the Ling-lite-1.5 version open-sourced had capabilities close to the same level SOTA, with performance between 4B and 8B of Qwen, successfully verifying the feasibility of training a 300B size SOTA MoE large language model on non-high-end computing platforms.

Performance comparable to international top-tier models

In multiple understanding and generation capability evaluations, Ming-lite-omni's performance is equivalent to or better than leading multi-modal large models of the 10B scale. Ant Group stated that this is known as the first open-source model that can rival GPT-4o in terms of modality support, providing important technical options and reference standards for global developers.

Bai Ling large model leader Xiting introduced the team's technical route: "We firmly use the MoE architecture in both language large models and multi-modal large models and extensively utilize non-high-end computing platforms, successfully demonstrating the ability of domestic GPUs to train models comparable to GPT-4o."