JD.com officially open-sourced its latest large model, JoyAI-LLM-Flash, on the Hugging Face platform on February 14. The model has a total of 4.8 billion parameters, with 3 billion activated parameters. It was pre-trained on 20 trillion text tokens, demonstrating excellent understanding of cutting-edge knowledge, reasoning capabilities, and programming skills.
JoyAI-LLM-Flash adopts a new FiberPO optimization framework, introducing fiber bundle theory into reinforcement learning, combined with the Muon optimizer and dense multi-token prediction (MTP) technology, successfully solving the instability issues when scaling traditional models. Compared to the non-MTP version, its throughput has increased by 1.3 to 1.7 times, greatly enhancing the model's training efficiency and application potential.
The model's architecture is a mixture-of-experts (MoE) model, with 40 layers, supporting a context length of 128K and a vocabulary size of 129K, marking a significant advancement for JD.com in the AI field.




