On May 21st, Tencent announced a comprehensive upgrade of its Qwen large language model suite, marking the continuous enhancement of its technical capabilities in the field of artificial intelligence. This upgrade covered multiple aspects, including the enhancements to the flagship fast thinking model Qwen TurboS, the deep thinking model Qwen T1, and the new visual deep reasoning model T1-Vision and end-to-end voice call model Qwen Voice based on the TurboS base model. Additionally, Tencent also updated a series of multimodal models such as Qwen Image 2.0, Qwen 3D v2.5, and Qwen game visual generation.

In the globally recognized authoritative large language model evaluation platform Chatbot Arena, Qwen TurboS has climbed to the top eight globally, ranking second domestically after DeepSeek. This achievement is due to the pre-training phase token augmentation as well as the introduction of long and short chain fusion technology during the post-training phase, which has significantly improved its performance in scientific reasoning, coding ability, and competition mathematics scores. The Qwen TurboS released at the beginning of the year is the industry's first large-scale mixed Mamba-MoE model, showing significant advantages in effect and performance.

WeChat_Screenshot_20250521134359.png

The deep thinking model Qwen T1 has been continuously rapidly iterating since its launch at the beginning of the year and has recently received a new upgrade, achieving improvements in competition mathematics, common sense answering, and complex task Agent capabilities. Based on the TurboS base model, Qwen further expands its multi-modal understanding capability for images and audio. The newly released Qwen Visual Deep Reasoning Model T1-Vision supports multi-image input, has native long-term thinking chains, and can easily achieve "thinking while looking at pictures," with significant improvement in overall effectiveness and understanding speed compared to previous cascaded solutions. The end-to-end voice call model Qwen Voice achieves low latency voice calls, with response speed increased by over 30%, and also shows noticeable improvement in human-like and emotional application capabilities, and has already been gray-scale launched in the Tencent Yuanbao App.

In the field of multimodal generation, Qwen Image 2.0 was the first to achieve "millisecond-level" image generation, with an accuracy rate exceeding 95% on the GenEval benchmark test, and performed excellently in subjective image quality and aesthetic evaluations by humans. Qwen 3D v2.5 achieved generational leaps in controllability and ultra-high-definition generation capabilities through its industry-first sparse 3D native architecture, increasing geometric model precision by 10 times and reaching 4K texture mapping. In end-to-end evaluations, both Qwen Text-to-3D and Qwen Image-to-3D achieved excellent results.

In the gaming sector, Qwen introduced the Qwen Game Visual Generation Model, proficient in game art and terminology, covering five sub-models: game skill effect generation, character dynamic portraits, real-time interactive game world models, and character multi-view models. The Qwen Game Visual Generation platform has officially gone online, aiming at industrial-grade game asset generation, increasing the efficiency of game art design by tens of times. Recently, Qwen will release its first large-scale, roamable 3D scene generation model, supporting immersive interaction, diverse style scene generation, and 360-degree panoramic roaming experience, assisting innovation in industries such as gaming and embodied intelligence.

Wang Di, Vice President of Tencent Cloud and Head of Tencent Qwen Large Language Model Technology, stated that Qwen is accelerating its depth and breadth towards intelligence, providing solid support for AI popularization and industrial upgrading. Qwen firmly embraces open source and continues to promote the full range of model openness across multiple sizes and scenarios. Currently, Qwen has achieved full modal openness in image, video, 3D, text, and more, with the download volume of Qwen 3D model Hugging Face exceeding 1.6 million. In the future, Qwen plans to release multi-size hybrid inference models to meet the different needs of enterprises and endpoints, and will continue to open-source Qwen image, video, 3D, and other multimodal foundational models along with their accompanying plugin models.