NVIDIA and MIT Collaborate to Launch Fast-dLLM Framework, Boosting AI Inference Speed by 27.6 Times

AIbase基地

Published inAI News · 5 min read · Jun 3, 2025

Recently, tech giant NVIDIA, in collaboration with the Massachusetts Institute of Technology (MIT) and the University of Hong Kong, released a new framework called Fast-dLLM. This innovative framework aims to significantly boost the inference speed of diffusion-based large language models (Diffusion-based LLMs), achieving up to a 27.6-fold increase, providing stronger technical support for AI applications.

The Challenges and Opportunities of Diffusion Models

Diffusion models are considered powerful competitors to traditional autoregressive models (Autoregressive Models). They use bidirectional attention mechanisms (Bidirectional Attention Mechanisms) to theoretically accelerate decoding by synchronously generating multiple tokens (Multi-token Generation). However, in practical applications, diffusion models often lag behind autoregressive models in terms of inference speed due to the need to recalculate all attention states during each generation step, resulting in high computational costs. Additionally, when decoding multiple tokens simultaneously, dependencies between tokens can easily be disrupted, affecting the quality of the generated output.

Innovations in the Fast-dLLM Framework

To address these issues, the NVIDIA team developed the Fast-dLLM framework, introducing two key innovations: block-wise approximate KV caching mechanism and confidence-aware parallel decoding strategy.

1. **Block-wise Approximate KV Caching Mechanism**: This mechanism divides the sequence into multiple blocks (Blocks), precomputes and stores the activation values (KV Activations) for each block, and reuses them during subsequent decoding steps. This approach significantly reduces computational redundancy and boosts efficiency. Its DualCache version further caches prefix and suffix tokens, leveraging the high similarity between adjacent inference steps to enhance processing speed.

2. **Confidence-Aware Parallel Decoding Strategy**: This strategy selectively decodes high-confidence tokens based on a set threshold (Confidence Threshold), avoiding dependency conflicts caused by synchronous sampling, thereby ensuring the quality of the generated output.

Outstanding Performance

Fast-dLLM has demonstrated excellent performance in various benchmark tests. On the GSM8K dataset, in an 8-shot configuration, it achieved a 27.6-fold speedup when generating sequences of 1024 tokens, with an accuracy rate of 76.0%. In the MATH benchmark test, the acceleration factor was 6.5x, with an accuracy rate of approximately 39.3%. In the HumanEval and MBPP tests, it achieved acceleration factors of 3.2x and 7.8x, respectively, maintaining accuracy rates at around 54.3% and baseline levels. Overall, Fast-dLLM achieves a balance between speed and quality, with only a 1-2 percentage point drop in accuracy while significantly increasing speed.

By addressing issues of inference efficiency and decoding quality, Fast-dLLM enables diffusion models to compete with autoregressive models in real-world language generation tasks, laying the groundwork for broader future applications. With the promotion of this technology, we can expect to see more practical applications of artificial intelligence in various fields.

Project: https://nvlabs.github.io/Fast-dLLM/

Fast-dLLM NVIDIA Diffusion Model AI New Term

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

OWL Team Opensources New Multi-Agent Tool Eigent: Revolutionizing Efficiency in Complex Task Processing

OWL team open-sourced Eigent, a multi-agent collaboration tool based on OWL framework, supporting complex task decomposition and multi-level parallel processing (inter/intra-worker, tool calls). Features 200+ built-in tools, Human-in-the-Loop, and dynamic AI team assembly. Fully open-source with excellent GAIA benchmark performance.....

Jul 31, 2025

Ollama Releases Desktop Client! Drag and Drop Documents, Multimodal Recognition - Local AI Says Goodbye to Command Line

Ollama launches a desktop client, bidding farewell to the single command-line mode, supporting macOS system. The new version offers a graphical interface, multimodal recognition, and document drag-and-drop features, simplifying the model management process, and supporting local large language models such as Llama3. The client strengthens the advantages of local operation, ensures data privacy, and optimizes performance. The open-source community actively expands the ecosystem, with future plans to release cross-platform versions. This upgrade makes local AI tools more user-friendly, providing developers and general users with more diverse application scenarios.

Jul 31, 2025

MediaTek's AI ASIC Project Will Drive Revenue Growth Next Year, Mobile Chip Prices Remain Stable

MediaTek's Q2 2025 financial report shows a 18.1% year-over-year increase in revenue, with the AI ASIC business becoming a new growth driver, expected to contribute $1 billion in revenue in 2026. The first AI ASIC product will be mass-produced next year, using MediaTek's self-developed SerDes technology, and will collaborate with NVIDIA on NVLink. In the mobile platform segment, the Dimensity 9000 series prices continue to rise, while the 8000 series may see a slight price reduction to compete. The automotive chip business is expected to achieve significant growth in 2025. The company is actively expanding through its strategies in AI and automotive fields.

Jul 31, 2025

A New Continent of Trillion Dollars! Who Will Dominate the Intelligent Agent Economy in the 'Human-Machine Symbiosis' Era?

The intelligent agent economy is reshaping the future business ecosystem, with its core being a new economic form built on multi-agent collaborative architecture (IA). Three pillars support this transformation: Decentralized Identifiers (DID) ensure the credible existence of agents as digital citizens; standardized communication protocols (such as MCP) enable seamless collaboration among agents across domains; data containers combined with privacy computing technologies ensure secure data flow. Blockchain and smart contracts form the underlying trust system, driving the creation of a decentralized value network. Agent factories achieve scalable production of agents through modular manufacturing.

Jul 31, 2025

ZhiLian Recruitment Launches AI Version, Job-Applicant Matching Rate Increases by 70%

Zhaopin's AI version, powered by Alibaba's Tongyi Qianwen 3, achieves 70% job-match accuracy. It analyzes resumes deeply, offering smarter job searches via conversational queries. With 374M users and 14.36M companies, AI is transforming recruitment.....

Jul 31, 2025

AI Doctor Assistant C8Health Raises $12 Million in Funding to Solve the Problem of Medical Information Isolation

Medical AI company C8Health raises $12 million in Series A funding, bringing total funding to $18 million. The company's AI platform integrates dispersed clinical guidelines and policy knowledge bases in hospitals, helping medical staff quickly access the information they need through smart push notifications, solving the problem of fragmented medical information. The platform is already used in over 100 hospitals in the United States, with a user activity rate of 90%, averaging 3.49 visits per user per day. Its unique feature is combining performance data with knowledge push to provide personalized improvement suggestions. This round of funding will be used to expand system-level deployment.

Jul 31, 2025

Tencent AI Makes a New Breakthrough: X-Omni Model Enables Intelligent Generation to Overcome Writing Difficulties, Achieving Text and Image Understanding and Generation in One Step

Tencent's X-Omni AI model integrates reinforcement learning to unify image generation and understanding, outperforming rivals in text rendering and visual tasks without classifier guidance.....

Jul 31, 2025

Will Baidu Search Homepage Turn into an AI Application Center? Intelligent Entity Entry in Gray Testing

Baidu is testing a new AI app entry on its PC homepage, featuring recommended AI apps under 'My Apps' post-login, including Wenxin AI platform, third-party apps, and Baidu's own programs, aiming to enhance search with AI. Official confirmation pending.....

Jul 31, 2025

SenseTime Collaborates with Chenghai to Create the First AI + Toy Industrial Base

SenseTime signed a strategic agreement with Chenghai District in Shantou to jointly build the country's first AI + Toy Industrial Base. As a globally renowned toy production base, Chenghai has an annual output value exceeding 50 billion yuan, and will leverage SenseTime's AI technology to achieve intelligent industrial upgrades. The cooperation includes establishing three centers and ten platforms, covering three directions: AI solutions, industry development, and innovative integration, promoting the intelligent transformation of toy design and manufacturing. This collaboration will help Chenghai move from a manufacturing powerhouse to a smart manufacturing powerhouse, creating a Chinese model for intelligent transformation in the global toy industry.

Jul 31, 2025

Google Will Sign the EU AI Code of Conduct, Demonstrating Its Commitment to Compliance!

Google signed the EU AI Code of Conduct, becoming the first tech giant to commit, unlike Meta. The code requires ethical AI practices, including copyright compliance. The EU AI Act takes effect August 2, targeting high-risk models with full compliance within two years. Google supports it but worries about slow approvals and copyright limits hindering Europe's AI competitiveness.....

Jul 31, 2025

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

NVIDIA and MIT Collaborate to Launch Fast-dLLM Framework, Boosting AI Inference Speed by 27.6 Times

AIbase基地

This article is from AIbase Daily

AI News Recommendations

OWL Team Opensources New Multi-Agent Tool Eigent: Revolutionizing Efficiency in Complex Task Processing

Ollama Releases Desktop Client! Drag and Drop Documents, Multimodal Recognition - Local AI Says Goodbye to Command Line

MediaTek's AI ASIC Project Will Drive Revenue Growth Next Year, Mobile Chip Prices Remain Stable

A New Continent of Trillion Dollars! Who Will Dominate the Intelligent Agent Economy in the 'Human-Machine Symbiosis' Era?

ZhiLian Recruitment Launches AI Version, Job-Applicant Matching Rate Increases by 70%

AI Doctor Assistant C8Health Raises $12 Million in Funding to Solve the Problem of Medical Information Isolation

Tencent AI Makes a New Breakthrough: X-Omni Model Enables Intelligent Generation to Overcome Writing Difficulties, Achieving Text and Image Understanding and Generation in One Step

Will Baidu Search Homepage Turn into an AI Application Center? Intelligent Entity Entry in Gray Testing

SenseTime Collaborates with Chenghai to Create the First AI + Toy Industrial Base

Google Will Sign the EU AI Code of Conduct, Demonstrating Its Commitment to Compliance!