In the new trend of large language model technology development, the Ant Technology Research Institute has recently officially released the LLaDA2.0 series, which is the first discrete diffusion large language model (dLLM) in the industry with a parameter scale of 100B. This innovative model not only breaks the traditional perception that diffusion models are difficult to scale, but also achieves significant improvements in generation quality and reasoning speed, opening up a new development direction for the field of large language models.

The LLaDA2.0 series includes two versions: 16B (mini) and 100B (flash). The 100B version released this time is currently the largest diffusion language model, especially suitable for demonstrating its outstanding performance in complex code generation and instruction execution tasks. Ant Group stated that LLaDA2.0 realizes the seamless inheritance of autoregressive (AR) model knowledge through a brand-new Warmup-Stable-Decay (WSD) pre-training strategy, avoiding the high cost of training from scratch.
In terms of technical details, LLaDA2.0 demonstrates the advantage of parallel decoding, with a reasoning speed of 535 tokens/s, 2.1 times faster than similar AR models. This speed improvement is due to the reuse of KV Cache and block-level parallel decoding technology during the reasoning process. In addition, Ant Group further optimized the model's data efficiency and reasoning speed during the post-training phase by using complementary masking and confidence-aware parallel training (CAP).
LLaDA2.0 performs outstandingly in multiple evaluation dimensions, especially in structured generation tasks such as code generation, showing stronger global planning capabilities. In complex agent calls and long-text tasks, LLaDA2.0 also performs excellently, demonstrating its strong adaptability in diverse application scenarios.
Ant Group's release not only marks a milestone in discrete diffusion technology, but also indicates the feasibility and advantages of diffusion models in ultra-large-scale application scenarios. In the future, Ant Group will continue to explore the potential of diffusion models, plan to expand the parameter scale, deeply integrate reinforcement learning and thinking paradigms, and strive to promote the advancement of generative AI.
Address: https://huggingface.co/collections/inclusionAI/llada-20






