In recent years of artificial intelligence research, the concept of "chains of thought" has gained increasing attention, especially in the training and reasoning of large language models. Recently, Professor Guojun Qi's team from the MAPLE Lab at Westlake University proposed a novel "diffusive divergent chain of thought," a completely new reasoning method specifically designed for diffusion language models.
Traditional large language models usually adopt a linear chain of thought, generating answers step by step through reasoning. However, human thinking processes are often more complex, filled with non-linearity and jumping characteristics. Professor Qi's team believes that mimicking this divergent thinking will help enhance the model's creativity and problem-solving abilities.
The core of the diffusive divergent chain of thought lies in allowing the model to generate intermediate results in any order during the reasoning process, without following traditional syntactic structures or readability requirements. Through this approach, the model can explore more diverse thinking paths and produce more creative and flexible answers. This method has been successfully applied in various diffusion language models, particularly excelling in mathematical reasoning and code generation tasks compared to existing models.
In its implementation, the team optimized the entire generation process using reinforcement learning. The model starts from an uninformative mask sequence, progressively generates key information, and derives the final answer during the denoising process. Unlike traditional chains of thought, the diffusive chain of thought can utilize intermediate generated content to enhance the accuracy of the final answer.
The research team's findings indicate that the diffusive divergent chain of thought not only enhances the model's reasoning ability but also provides important insights for future model training. This innovative approach, particularly in Google's recently released Gemini Diffusion model, suggests broader application potential. In the future, the diffusive chain of thought is expected to become a standard training procedure for diffusion language models.
arXiv link: https://arxiv.org/abs/2505.10446
GitHub link: https://github.com/maple-research-lab/LLaDOU