RND1 is an experimental diffusion language model with 30 billion parameters, adopting a sparse mixture of experts architecture. This model is converted from a pre-trained autoregressive base model, supporting diffusion-based text generation. Only 3 billion parameters are activated for each token, achieving a balance between computational efficiency and model capacity.
Natural Language Processing
Transformers