First to Surpass Autoregressive Models! Ant Group Opens Sourced the Industry's First High-Performance Diffusion Language Model Inference Framework dInfer

AIbase基地

Published inAI News · 5 min read · Oct 13, 2025

The Ant Group officially open-sourced the industry's first high-performance diffusion language model inference framework, dInfer, on October 13th.

In benchmark tests, dInfer improved the inference speed of diffusion language models by 10.7 times compared to NVIDIA's diffusion model framework Fast-dLLM; in the code generation task HumanEval, dInfer achieved a speed of 1011 Tokens/second in single-batch inference, becoming the first in the open-source community to achieve significantly faster single-batch inference speed than autoregressive models. The work of dInfer demonstrates that diffusion language models have significant efficiency potential and can be realized through systematic engineering innovation, providing a highly competitive option for the architectural path towards AGI.

Diffusion language models, as a new paradigm, treat text generation as a "denoising process" of gradually restoring a complete sequence from random noise, offering three major advantages: high parallelism, global perspective, and flexible structure. With these advantages, models such as LLaDA-MoE released by Ant Group and Renmin University have shown accuracy comparable to top autoregressive (AR) models in multiple benchmark tests. However, in terms of inference efficiency, the theoretical strong potential of dLLMs has long been constrained by harsh realities. Efficient inference of dLLMs faces three major challenges: high computational cost, KV cache failure, and parallel decoding. These bottlenecks have made the inference speed of diffusion language models unsatisfactory. How to break free from these constraints and unlock the potential of diffusion language models in inference efficiency has become an urgent problem in the field.

dInfer is a high-performance inference framework specifically designed for diffusion language models, combining algorithms and systems deeply. It supports various diffusion language models, including LLaDA, LLaDA-MoE, and LLaDA-MoE-TD.

dInfer includes four core modules: Model, KV-Cache Manager, Iteration Manager, and Decoder. This plug-and-play architecture allows developers to combine and explore different module optimization strategies like building with LEGO bricks, and conduct standardized evaluations on a unified platform. More importantly, dInfer integrates targeted solutions for the above three challenges in each module.

(Figure description: Architecture of dInfer)

On a node equipped with 8 NVIDIA H800 GPUs, dInfer's performance is impressive:

Compared to the previous dLLM inference solution Fast-dLLM, dInfer achieved a 10.7 times increase in average inference speed (avg TPS) while maintaining similar model performance (681 vs 63.6); in the code generation task HumanEval, dInfer achieved a speed of 1011 tokens/second in single-batch inference. Compared to the AR model Qwen2.5-3B, which runs on the industry-leading inference service framework vLLM and has similar parameters and performance, dInfer's average inference speed is 2.5 times higher (681 vs 277).

Ant Group stated that dInfer connects cutting-edge research with industrial application, marking a crucial step for diffusion language models from "theoretical feasibility" to "practical efficiency." This open-source initiative also invites developers and researchers around the world to jointly explore the great potential of diffusion language models and build a more efficient and open AI new ecosystem.

Ant Group Launches dInfer: Significantly Speeding Up the Inference of Diffusion Language Models by 10 Times!

Ant Group open-sources dInfer, the first high-performance inference framework for diffusion language models in the industry, significantly improving inference speed. Benchmark tests show that it is 10.7 times faster than NVIDIA Fast-dLLM, achieving 1011 Tokens per second in single inference on the HumanEval code generation task, pushing technology toward practical applications.

Silicon-Based Flow Platform Launches Alibaba Qwen3-VL Model, Significantly Enhancing Visual Cognition Capabilities

The Silicon-Based Flow Platform has launched the Alibaba Qwen3-VL open-source model, which shows significant progress in visual understanding, temporal analysis, and multimodal reasoning. It can effectively address challenges such as image blurriness and complex videos, enhancing visual cognition capabilities. It supports OCR in 32 languages, accurately processes weak visual information, and helps users easily handle complex visual tasks.

New Breakthrough in Diffusion Models: Radical Numerics Open-Sources 30B-Parameter RND1 AI, Marking a Key Step in Self-Improvement

Radical Numerics released the open-source diffusion language model RND1-Base with 30B parameters, using a sparse expert mixture architecture that activates only 3B parameters. The model has advantages in parallel generation and performs well in benchmark tests. It also publishes complete weights and training methods, promoting the development of diffusion model technology.

Latest Domestic Direct Connection Sora2 No Watermark Free Usage Tutorial

OpenAI released Sora2, with over a million downloads in five days, topping the App Store free chart, with a growth rate surpassing GPT. Compared to its predecessor, the text understanding capability has significantly improved, enabling the generation of complete videos with synchronized audio and video based on simple prompt words, without the need for manual voiceover or music, suitable for short videos, advertisements, short plays, MVs, and animation production.

Malaysia Enters a New Era of AI, ChatGPT Go Aids Digital Transformation

OpenAI launches the ChatGPT Go subscription service in Malaysia, with a monthly fee of approximately $9.25, significantly lowering the barrier to AI usage. The service includes the GPT-5 model and features such as image generation, file upload, and memory, enhancing the user experience. This move aims to attract the rapidly growing mid-tier users and students in the region.

Perplexity CEO Announces Farewell to PPT, Embracing AI for a New Investor Pitch Model

The CEO of AI search tool Perplexity, Srinivas, revealed that he has abandoned traditional PPT funding presentations and now uses AI for investor pitches. He created a single set of slides during his Series A funding round and then relied on AI technology to streamline the process, demonstrating the transformative impact of artificial intelligence on business activities.

OpenAI and Microsoft Reach a Major Deal: Equity Structure Changes, Investors Face Dilution Risks

Recent transactions by OpenAI have complicated its equity structure, causing investors to question their returns. The company's valuation has reached $500 billion, making it the most valuable non-publicly traded company globally. This is mainly due to multi-billion-dollar chip contracts with NVIDIA and AMD, with the funds intended to achieve a trillion-level computing power deployment goal.

Mogu Car Alliance Accelerates AI Commercialization, Former Didi Executive Fu Qiang Joins as President

Mogu Car Alliance appointed Fu Qiang, former Senior Vice President of Didi, as the new president, responsible for the implementation and commercialization of AI business strategy. Fu Qiang has over ten years of experience in intelligent transportation and previously held several key positions at Didi and served as Chief Operating Officer of Manbang Group.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

First to Surpass Autoregressive Models! Ant Group Opens Sourced the Industry's First High-Performance Diffusion Language Model Inference Framework dInfer

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Ant Group Launches dInfer: Significantly Speeding Up the Inference of Diffusion Language Models by 10 Times!

Silicon-Based Flow Platform Launches Alibaba Qwen3-VL Model, Significantly Enhancing Visual Cognition Capabilities

New Breakthrough in Diffusion Models: Radical Numerics Open-Sources 30B-Parameter RND1 AI, Marking a Key Step in Self-Improvement

Latest Domestic Direct Connection Sora2 No Watermark Free Usage Tutorial

Malaysia Enters a New Era of AI, ChatGPT Go Aids Digital Transformation

Perplexity CEO Announces Farewell to PPT, Embracing AI for a New Investor Pitch Model

OpenAI and Microsoft Reach a Major Deal: Equity Structure Changes, Investors Face Dilution Risks

Claude Code Launches a New Plugin System to Enhance Development Efficiency

Mogu Car Alliance Accelerates AI Commercialization, Former Didi Executive Fu Qiang Joins as President

AI Daily: LiblibAI 2.0 Officially Launches; Tongyi Qianwen, Doubao Enable Memory Function; Sora Now Available on Google Play

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

First to Surpass Autoregressive Models! Ant Group Opens Sourced the Industry's First High-Performance Diffusion Language Model Inference Framework dInfer

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Ant Group Launches dInfer: Significantly Speeding Up the Inference of Diffusion Language Models by 10 Times!

Silicon-Based Flow Platform Launches Alibaba Qwen3-VL Model, Significantly Enhancing Visual Cognition Capabilities

New Breakthrough in Diffusion Models: Radical Numerics Open-Sources 30B-Parameter RND1 AI, Marking a Key Step in Self-Improvement

Latest Domestic Direct Connection Sora2 No Watermark Free Usage Tutorial

Malaysia Enters a New Era of AI, ChatGPT Go Aids Digital Transformation

Perplexity CEO Announces Farewell to PPT, Embracing AI for a New Investor Pitch Model

OpenAI and Microsoft Reach a Major Deal: Equity Structure Changes, Investors Face Dilution Risks

Claude Code Launches a New Plugin System to Enhance Development Efficiency

Mogu Car Alliance Accelerates AI Commercialization, Former Didi Executive Fu Qiang Joins as President

AI Daily: LiblibAI 2.0 Officially Launches; Tongyi Qianwen, Doubao Enable Memory Function; Sora Now Available on Google Play

GEO Services