Alibaba has recently open-sourced its latest architecture model Qwen3-Next-80B-A3B, marking a significant advancement in the company's artificial intelligence generated content (AIGC) efforts. The model features innovations in hybrid attention mechanisms, high sparsity expert models (MoE), and training methods, demonstrating significant performance improvements.

image.png

The total parameters of Qwen3-Next reach 80 billion, but only 3 billion parameters are activated during inference, significantly reducing the training cost by 90% compared to its predecessor Qwen3-32B. Additionally, its inference efficiency has improved 10 times, especially showing remarkable performance in handling ultra-long texts (over 32K). This allows Qwen3-Next to match or even surpass Alibaba's flagship model Qwen3-235B in executing instructions and processing long context tasks, as well as outperforming Google's latest Gemini-2.5-Flash reasoning model.

The core innovation of this model lies in the hybrid expert architecture, which combines gate DeltaNet and gate attention. Through this design, Qwen3-Next overcomes the shortcomings of traditional attention mechanisms in handling long contexts, ensuring speed while enhancing context learning capabilities. The model uses a high sparsity MoE structure during training, maximizing resource utilization without compromising performance.

In addition, Qwen3-Next introduces a multi-token prediction mechanism, improving the model's performance in speculative decoding. During the pre-training phase, the efficiency of Qwen3-Next has significantly increased compared to Qwen3-32B, with training costs only 9.3% of that of Qwen3-32B, yet achieving better performance. In terms of inference speed, Qwen3-Next shows a 7-fold increase in throughput when processing long texts compared to Qwen3-32B, maintaining a 10-fold speed advantage even in longer contexts.

image.png

Alibaba's new model has not only achieved breakthroughs in technology but also received widespread attention and praise, especially among developers and researchers. Whether in technological innovation or market competitiveness, Qwen3-Next marks Alibaba's further leadership in the field of artificial intelligence.

Online experience: https://chat.qwen.ai/

Open source address: https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d

Key points:

🌟 The Qwen3-Next-80B-A3B model has 80 billion total parameters, with a 90% reduction in training cost and a 10-fold improvement in inference efficiency.

🔍 The new model adopts a hybrid expert architecture and a multi-token prediction mechanism, significantly enhancing context processing capabilities.

🚀 In terms of inference speed, Qwen3-Next performs excellently in ultra-long text scenarios, with throughput increasing by 7 to 10 times compared to the previous model.