AI model architectures are undergoing a profound transformation. Diffusion language models, with their parallel generation and efficient inference capabilities, are becoming the focus of industry attention. On October 9th, the AI research institution Radical Numerics officially released RND1-Base, the largest open-source diffusion language model to date, with a parameter scale of 30B, of which 3B are active parameters, using a sparse expert mixture architecture. The model not only performs well in benchmark tests, but also opens up complete weights, training recipes, and inference code, aiming to accelerate post-training and inference research in the field of diffusion language models.
RND1-Base is based on the autoregressive base model Qwen3-30BA3B, and achieves a seamless transition to the diffusion paradigm through simple continuous pre-training. The conversion process uses a bidirectional masking mechanism and layer-specific learning rates to preserve existing knowledge, and uses large batch training of up to 8M tokens to ensure stability, ultimately completing pre-training on 500B tokens. This efficient approach avoids the resource waste of training from scratch, demonstrating Radical Numerics' innovative thinking in model reuse.
Different from traditional autoregressive language models that generate tokens sequentially, RND1 views text generation as a process similar to image denoising, refining the entire sequence in parallel from noise, and supports a bidirectional attention mechanism. This not only enhances the flexibility and controllability of generation, but also significantly reduces inference latency, making it particularly suitable for complex reasoning and code generation tasks.
In general benchmark tests, RND1-Base has shown strong performance, surpassing earlier open-source diffusion language models such as Dream-7B and LLaDA-8B. Specific results include 57.2% on MMLU multi-task language understanding, 72.1% on GSM8K mathematical reasoning, and 51.3% on MBPP code generation. These metrics cover reasoning, STEM, and programming fields, proving that the model maintains the advantages of the autoregressive base while achieving performance improvements in the diffusion architecture.
The sparse expert mixture design of RND1 activates only 3B parameters out of 30B total parameters, optimizing computational efficiency and making it suitable for large-scale deployment. The model has not yet undergone post-training, and may occasionally repeat when using greedy sampling, but the open-source code integrates FlashInfer and SGLang backends, supporting fast inference iteration.
Radical Numerics positions itself as the next-generation AI laboratory, focusing on building recursive self-improvement engines. RND1 is the product of this vision, allowing models to participate in optimizing the next generation of AI through an automated AI research platform. The team consists of researchers and engineers from top institutions such as DeepMind, Meta, Liquid, and Stanford, with the goal of enabling AI to design AI autonomously and accelerate scientific and industrial discoveries.
The purpose of open-sourcing RND1 is to inspire the community to explore the potential of diffusion language models for inference optimization and post-training. Currently, the application of diffusion models in the language domain is moving from experimental stages to practical applications, especially showing advantages in parallel generation of long sequences. Industry experts expect that this move will stimulate more experiments in converting autoregressive models to diffusion models, filling the gap in the open-source ecosystem for efficient generation models.
Although RND1 leads in scale and performance, the generalization ability and memory overhead of diffusion models still need further optimization. Future integration with multi-objective fine-tuning or hybrid architectures is expected to further unlock its potential. Radical Numerics has opened recruitment, welcoming AI professionals to join this cutting-edge exploration.
This breakthrough marks an important turning point for diffusion language models, transitioning from theoretical exploration to engineering practice. By open-sourcing such a large-scale diffusion model, Radical Numerics not only provides the research community with valuable tools, but also opens new possibilities for AI self-improvement and recursive optimization. As more researchers get involved in this field, diffusion language models may become a key direction for the next generation of AI architectures.