Challenging Conventional Wisdom! Ant Group and Renmin University to Launch the First Native MoE Diffusion Language Model in the Industry at the 2025 Bund Conference

AIbase基地

Published inAI News · 7 min read · Sep 12, 2025

Ant Group and Renmin University jointly developed the native MoE architecture diffusion language model (dLLM) LLaDA-MoE, which was trained from scratch on about 20T data for the MoE architecture diffusion language model, verifying the scalability and stability of industrial-scale large-scale training; its performance exceeds the previously released dense diffusion language models LLaDA1.0/1.5 and Dream-7B, matching equivalent autoregressive models and maintaining a significant advantage in inference speed. The model will be fully open-sourced soon to promote technological development in dLLM within the global AI community.

On September 11, at the 2025 Inclusion·Bund Conference, Ant Group and Renmin University jointly launched the industry's first native MoE architecture diffusion language model (dLLM) "LLaDA-MoE." Assistant Professor Li Chongxuan from the Guangqi Institute of Artificial Intelligence at Renmin University, Director of the General Artificial Intelligence Research Center at Ant Group, Adjunct Researcher at West Lake University, and Founder of West Lake Xinchen, Lan Zhenzhong, participated in the launch ceremony.

(Renmin University and Ant Group jointly launched the first MoE architecture diffusion model LLaDA-MoE)

According to the introduction, this new model through a non-autoregressive mask diffusion mechanism, achieved language intelligence comparable to Qwen2.5 (such as context learning, instruction following, code and math reasoning) for the first time in large-scale language models with natively trained MoE, challenging the mainstream perception that "language models must be autoregressive."

Performance data shows that the LLaDA-MoE model outperforms diffusion language models such as LLaDA1.0/1.5 and Dream-7B in tasks such as code, mathematics, and Agent, approaching or surpassing the autoregressive model Qwen2.5-3B-Instruct, achieving the performance of an equivalent 3B dense model by activating only 1.4B parameters.

(Performance of LLaDA-MoE)

"The LLaDA-MoE model has verified the scalability and stability of industrial-scale large-scale training, meaning we have taken another step forward in scaling up dLLM to larger scales," said Lan Zhenzhong at the launch event.

Assistant Professor Li Chongxuan from the Guangqi Institute of Artificial Intelligence at Renmin University introduced, "After two years, the capabilities of AI large models have advanced rapidly, but some problems have not been fundamentally solved. The reason is that the current prevalent autoregressive generation paradigm used by large models is inherently unidirectional modeling, generating one token after another from front to back. This makes it difficult for them to capture bidirectional dependencies between tokens."

Facing these issues, some researchers have chosen to take a different approach, turning their attention to parallel decoding diffusion language models. However, existing dLLMs are all based on dense architectures, making it difficult to replicate the "parameter expansion and computational efficiency" advantages of MoE in ARM. In this industry context, the joint research team from Ant and Renmin University introduced the first native diffusion language model LLaDA-MoE on the MoE architecture.

Lan Zhenzhong also stated, "We will open-source the model weights and our self-developed inference framework to the global community to jointly drive the next breakthrough in AGI."

According to the information, the Ant and Renmin University team worked for three months, rewriting the training code based on LLaDA-1.0, and using Ant's self-developed distributed framework ATorch to provide EP parallel and other parallel acceleration technologies. Based on the training data of Ant's Ling2.0 base model, they made breakthroughs in core challenges such as load balancing and noise sampling drift, and finally completed efficient training on about 20T data using the 7B-A1B (total 7B, activated 1.4B) MoE architecture.

Under Ant's self-developed unified evaluation framework, LLaDA-MoE achieved an average improvement of 8.4% on 17 benchmarks such as HumanEval, MBPP, GSM8K, MATH, IFEval, BFCL, leading LLaDA-1.5 by 13.2%, and tied with Qwen2.5-3B-Instruct. The experiment once again verified that the "MoE amplifier" law also applies to the dLLM field, providing a feasible path for subsequent 10B–100B sparse models.

According to Lan Zhenzhong, in addition to the model weights, Ant will also open-source the inference engine optimized for the parallel characteristics of dLLM. Compared with NVIDIA's official fast-dLLM, this engine achieves significant acceleration. Related code and technical reports will be released on GitHub and Hugging Face communities soon.

80 Billion Parameters With Only 30 Billion! Qwen3 New Model's Inference Speed Increases by 10 Times

The Tongyi Qianwen team from Alibaba has just thrown a major surprise to global developers. The Qwen3-Next-80B-A3B-Instruct model they are about to release completely redefines the traditional large model operation logic. This seemingly contradictory number combination hides a remarkable technological breakthrough: a total of 8 billion parameters, but only 3 billion are actually activated, like a super sports car using only one-tenth of its engine yet running ten times faster. Just hours ago, Hugging Face Tr

Aliyun Opensources Tongyi Wanxiang Wan2.2: The World's First MoE Architecture Video Generation Model Shockingly Released

Alibaba opensources the video generation model Tongyi Wanxiang Wan2.2, which includes three core models: text-to-video, image-to-video, and unified video generation. Innovations include: 1) the first MoE architecture in the industry, improving computational efficiency by 50%; 2) a cinematic aesthetic control system, enabling precise control over lighting, color, and other cinematic effects; 3) a 5B small-sized unified model that can be deployed on consumer-grade GPUs. This series of models has been open-sourced on platforms such as GitHub, with over 5 million downloads, significantly lowering the barrier to AI video generation.

Free! DeepSeek R1T Chimera Officially Launches on OpenRouter Platform

Developed by TNG Technology Consulting, the DeepSeek R1T Chimera model has officially launched on the OpenRouter platform, providing global developers with efficient and powerful inference capabilities. This new open-source model combines the excellent inference capabilities of DeepSeek R1 with the high performance of V3-0324, marking another significant breakthrough in the balance of performance and efficiency in open-source AI technology. The following is compiled by AIbase.

AI Daily: Kunlun Wanwei Open-Sources Skywork-OR1 Series Models; iFlytek Xingchen Agent Platform Fully Supports MCP; Kimi Open-Sources Vision-Language Model Kimi-VL

Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest content in the AI field, focusing on developers, helping you understand technology trends and learn about innovative AI product applications. Discover new AI products here: https://top.aibase.com/ 1、Kimi open-sources vision-language models Kimi-VL and Kimi-VL-Thinking, surpassing GPT-4oMoonshot AI on multiple benchmarks.

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

Meta has released its latest open-source AI model, Llama 4, marking another significant advancement in the field of artificial intelligence. Llama 4 comes in two versions, Scout and Maverick, designed to enhance AI model capabilities and performance. Meta states that Llama 4 is a multimodal large language model capable of processing various data types, including text, images, video, and audio, and can freely convert between these formats. Notably, the Llama 4 series is the first...

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

Challenging Conventional Wisdom! Ant Group and Renmin University to Launch the First Native MoE Diffusion Language Model in the Industry at the 2025 Bund Conference

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

80 Billion Parameters With Only 30 Billion! Qwen3 New Model's Inference Speed Increases by 10 Times

Kunlun Wanzhi Launches AI Music Model Mureka V7.5 and Introduces MoE-TTS Text-to-Speech Model

OpenAI Makes a Major Open Source Release! GPT-OSS Model Leak Exposes 11.6 Billion Parameter MoE Architecture, Marking the Arrival of a New Era in AI?

Aliyun Opensources Tongyi Wanxiang Wan2.2: The World's First MoE Architecture Video Generation Model Shockingly Released

DeepSeek R1T Chimera Launches on OpenRouter Platform: Combining R1 and V3-0324!

Free! DeepSeek R1T Chimera Officially Launches on OpenRouter Platform

AI Daily: Kunlun Wanwei Open-Sources Skywork-OR1 Series Models; iFlytek Xingchen Agent Platform Fully Supports MCP; Kimi Open-Sources Vision-Language Model Kimi-VL

ByteDance Unveils Seed-Thinking-v1.5: A New Contender in AI Reasoning Competitions

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

Challenging Conventional Wisdom! Ant Group and Renmin University to Launch the First Native MoE Diffusion Language Model in the Industry at the 2025 Bund Conference

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

80 Billion Parameters With Only 30 Billion! Qwen3 New Model's Inference Speed Increases by 10 Times

Kunlun Wanzhi Launches AI Music Model Mureka V7.5 and Introduces MoE-TTS Text-to-Speech Model

OpenAI Makes a Major Open Source Release! GPT-OSS Model Leak Exposes 11.6 Billion Parameter MoE Architecture, Marking the Arrival of a New Era in AI?

Aliyun Opensources Tongyi Wanxiang Wan2.2: The World's First MoE Architecture Video Generation Model Shockingly Released

DeepSeek R1T Chimera Launches on OpenRouter Platform: Combining R1 and V3-0324!

Free! DeepSeek R1T Chimera Officially Launches on OpenRouter Platform

AI Daily: Kunlun Wanwei Open-Sources Skywork-OR1 Series Models; iFlytek Xingchen Agent Platform Fully Supports MCP; Kimi Open-Sources Vision-Language Model Kimi-VL

ByteDance Unveils Seed-Thinking-v1.5: A New Contender in AI Reasoning Competitions

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

GEO Services