Early this morning, the Alibaba Tongyi Qianwen team released the Qwen2 series of open-source models. This series includes five sizes of pre-trained and instruction-tuned models: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Key information indicates that these models have significantly improved in terms of parameter count and performance compared to the previous generation, Qwen1.5.

Regarding the multilingual capabilities of the models, the Qwen2 series has invested heavily in increasing the quantity and quality of the dataset, covering 27 other languages besides English and Chinese. Comparative testing has shown that large models (with over 70B parameters) excel in natural language understanding, coding, mathematical abilities, and more. The Qwen2-72B model has even surpassed its predecessor in terms of performance and parameter count.

The Qwen2 models not only demonstrate strong capabilities in basic language model evaluations but also achieve remarkable results in instruction-tuned model assessments. Their multilingual abilities shine in benchmarks like M-MMLU and MGSM, showcasing the powerful potential of Qwen2 instruction-tuned models.

The release of the Qwen2 series marks a new height in artificial intelligence technology, providing broader possibilities for global AI applications and commercialization. Looking ahead, Qwen2 will further expand model sizes and multimodal capabilities, accelerating the development of the open-source AI field.

Model Information

The Qwen2 series includes five sizes of base and instruction-tuned models, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. We have outlined the key information for each model in the table below:

Models	Qwen2-0.5B	Qwen2-1.5B	Qwen2-7B	Qwen2-57B-A14B	Qwen2-72B
# Parameters	0.49M	1.54M	7.07B	57.41B	72.71B
# Non-Emb Parameters	0.35M	1.31B	5.98M	56.32M	70.21B
Quality Assurance	True	True	True	True	True
Tie Embedding	True	True	False	False	False
Context Length	32K	32K	128K	64K	128K

Specifically, in Qwen1.5, only Qwen1.5-32B and Qwen1.5-110B used Group Query Attention (GQA). This time, we applied GQA to all model sizes to enable them to benefit from faster speeds and less memory usage during model inference. For smaller models, we prefer to apply tying embedding because large sparse embeddings account for a significant portion of the model's total parameters.

In terms of context length, all base language models have been pre-trained on data with a context length of 32K tokens, and we have observed satisfactory extrapolation capabilities up to 128K in PPL evaluations. However, for instruction-tuned models, we are not satisfied with just PPL evaluations; we need the models to correctly understand long contexts and complete tasks. In the table, we list the context length capabilities of the instruction-tuned models, which are evaluated through assessments on the Needle in a Haystack task. Notably, when enhanced with YARN, the Qwen2-7B-Instruct and Qwen2-72B-Instruct models both exhibit impressive capabilities, able to handle context lengths of up to 128K tokens.

We have made significant efforts to increase the quantity and quality of the pre-training and instruction-tuning datasets, which cover multiple languages besides English and Chinese, to enhance their multilingual capabilities. Although large language models inherently have the ability to generalize to other languages, we explicitly emphasize that we have included 27 other languages in our training:

Region	Languages
Western Europe	German, French, Spanish, Portuguese, Italian, Dutch
Eastern Europe and Central Europe	Russian, Czech, Polish
Middle East	Arabic, Persian, Hebrew, Turkish
East Asia	Japanese, Korean
Southeast Asia	Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog
South Asia	Hindi, Bengali, Urdu

Additionally, we have invested considerable effort in addressing the issue of code-switching that often arises in multilingual evaluations. Therefore, our models' ability to handle this phenomenon has significantly improved. Evaluations using prompts that typically trigger cross-language code-switching have confirmed a significant reduction in related issues.

Performance

Comparative test results show that the performance of large-scale models (with over 70B parameters) has significantly improved compared to Qwen1.5. This test centers on the large-scale model Qwen2-72B. In terms of base language models, we compared the performance of Qwen2-72B with the current best open-source models in natural language understanding, knowledge acquisition, programming abilities, mathematical abilities, multilingual abilities, and more. Thanks to carefully selected datasets and optimized training methods, Qwen2-72B outperforms leading models like Llama-3-70B, and even surpasses the previous generation Qwen1.5-110B with fewer parameters.

After extensive large-scale pre-training, we conducted post-training to further enhance Qwen's intelligence, bringing it closer to human capabilities. This process further improved the model's abilities in coding, mathematics, reasoning, instruction following, multilingual understanding, and more. Additionally, it aligns the model's outputs with human values, ensuring they are useful, honest, and harmless. Our post-training phase is designed with principles of scalable training and minimal human annotation. Specifically, we researched how to obtain high-quality, reliable, diverse, and creative demonstration data and preference data through various automatic alignment strategies, such as rejection sampling for mathematics, execution feedback for coding and instruction following, back-translation for creative writing, and scalable supervision for role-playing. As for training, we combined supervised fine-tuning, reward model training, and online DPO training. We also adopted a novel online merging optimizer to minimize the alignment tax. These combined efforts significantly enhanced the capabilities and intelligence of our models, as shown in the table below.

We conducted a comprehensive evaluation of Qwen2-72B-Instruct across 16 benchmarks in various fields. Qwen2-72B-Instruct achieved a balance between better capabilities and alignment with human values. Specifically, Qwen2-72B-Instruct significantly outperformed Qwen1.5-72B-Chat in all benchmarks and achieved competitive performance compared to Llama-3-70B-Instruct.

On smaller models, our Qwen2 models also outperform similar or even larger SOTA models. Compared to the recently released SOTA models, Qwen2-7B-Instruct still shows an advantage in various benchmarks, especially in coding and Chinese-related metrics.

Emphasis

Coding and Mathematics

We have always been committed to enhancing Qwen's advanced features, especially in coding and mathematics. In coding, we successfully integrated CodeQwen1.5's code training experience and data, resulting in significant improvements in Qwen2-72B-Instruct's capabilities in various programming languages. In mathematics, by leveraging extensive and high-quality datasets, Qwen2-72B-Instruct has demonstrated stronger abilities in solving mathematical problems.

Long Context Understanding

In Qwen2, all instruction-tuned models have been trained in a 32k length context and use technologies like YARN or Dual Chunk Attention to infer to longer context lengths.

The chart below shows our test results on Needle in a Haystack. Notably, Qwen2-72B-Instruct can perfectly handle information extraction tasks in a 128k context, coupled with its inherent strong performance, making it the preferred choice for handling long text tasks when resources are sufficient.

Additionally, it is worth noting the impressive capabilities of the other models in the series: Qwen2-7B-Instruct almost perfectly handles a context length of up to 128k, Qwen2-57B-A14B-Instruct manages up to 64k, and the two smaller models in the series support 32k.

In addition to long context models, we have also open-sourced an agent solution for efficiently processing documents containing up to 1 million tokens. For more details, please refer to our dedicated blog post on this topic.

Safety and Responsibility

The table below shows the proportion of harmful responses generated by large models for four types of multilingual unsafe queries (illegal activities, fraud, pornography, privacy violence). The test data comes from Jailbreak and is translated into multiple languages for evaluation. We found that Llama-3 cannot effectively handle multilingual prompts, so it was not included in the comparison. Through significance testing (P_value), we found that the Qwen2-72B-Instruct model's performance in terms of safety is comparable to GPT-4 and significantly better than the Mistral-8x22B model.

Qwen2 Alibaba Tongyi Qianwen 72B Parameter Model Open-Source Large Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Tencent WeChat AI Team Launches New Diffusion Language Model WeDLM to Improve Reasoning Efficiency

Tencent WeChat AI team has launched a new diffusion language model called WeDLM, aimed at improving text generation efficiency. The model combines diffusion models with causal attention mechanisms, and uses topological reordering technology to be compatible with KV caching, solving the issue of inference efficiency caused by bidirectional attention in traditional diffusion models, and breaking through the limitations of large models such as GPT in parallel inference.

Jan 13, 2026

130

BaiChuan Intelligence Launches Baichuan-M3: Open-Source Medical Large Model Tops Evaluations, Surpassing GPT-5.2 and Human Doctors

Baichuan-M3, a new open-source medical AI model by Baichuan, outperforms GPT-5.2 in medical evaluations and surpasses average human doctors in some tasks, showcasing a key breakthrough in specialized AI development.....

Jan 13, 2026

100

AI Can Finally Manipulate Things! Vercel Launches Agent Browser to Let Large Models Control Websites

Vercel launches Agent Browser, enabling AI agents to directly interact with web pages through clicks and form filling, expanding capabilities with zero configuration and ease of use, significantly boosting development efficiency.....

Jan 13, 2026

150

The New Favorite in the AI Era! DINQ Creates an Intelligent Professional Business Card with One Click, Revolutionizing the Traditional Resume Model

DINQ is an AI-native career platform that aggregates multi-platform data to help AI professionals build dynamic, credible career identities, addressing talent discovery challenges and facilitating efficient connections with enterprises.....

Jan 13, 2026

140

Why Do AI Large Models Fail to Adapt in Grassroots Hospitals? Exploring the Reasons Behind the Issue!

In the first half of 2025, a grassroots hospital in the Beijing-Tianjin-Hebei region introduced a medical large model system, aiming to improve the efficiency of electronic medical record generation and assist in diagnosis. However, the actual application results were unsatisfactory, even causing negative effects. The main reason was that the model struggled to accurately recognize local dialects, leading to errors in medical records and affecting the effectiveness of diagnostic assistance.

Jan 13, 2026

120

Huatu Shanding AI Breakthrough in Civil Service Exam Essay Grading: Accurate Scoring in 2 Minutes, OMO Model Redefining the Education and Training Experience

Huatu Shanding utilizes its self-developed AI technology to revolutionize the essay grading process in civil service exam training. Traditional manual grading faces pain points such as slow feedback, high costs, and inconsistent standards. This AI system transforms subjective evaluation into quantifiable and traceable intelligent assessment, promoting the education and training services towards efficiency, accuracy, and personalization.

Jan 13, 2026

130

Guangyun Technology Clarifies AI Business: No Self-Developed Large Model, Related Revenue is Small, Future Contribution is Uncertain

Guangyun Tech clarifies its AI business strategy, stating it only integrates third-party large models without independent R&D in core technology. AI product revenue remains minimal and not yet profitable at scale.....

Jan 13, 2026

130

Yuedao Collaborates with Shengshu Technology: The AIGC Film and Television Ecosystem Begins to Take Shape - Vidu Large Model Empowers IP Visualization, Talent Education is Also Implemented

Yuewen Group partners with Shengshu Tech to integrate Vidu, a multimodal video generation model, into its platform, accelerating IP adaptation and AIGC industry growth.....

Jan 13, 2026

2 Months Generates 1 Billion Images! Google Nano Banana Pro Becomes a Global Sensation with Studio-Level Image Quality

The Google Gemini3Pro image generation model has generated over 1 billion images in two months. It supports local editing, lens adjustment, and lighting control, and can output 2K/4K multi-language text images, significantly enhancing creative control.

Jan 13, 2026

110

Hong Kong Stock Market AI Concept Stocks Surge Against the Trend: Meitu's Market Capitalization Exceeds 40 Billion Hong Kong Dollars

On January 12, the digital application sector in the Hong Kong stock market rose broadly, with Meitu Company's stock price increasing by 11.73% and its market capitalization exceeding 40 billion Hong Kong dollars. A report from Morgan Stanley noted that the acceleration of monetization of digital tools and the trend of software localization combined with industry demand make them cautiously optimistic about the prospects of the software market in 2026.

Jan 12, 2026

180

Language		Illegal Activities			Fraud			Pornography			Privacy Violence
	GPT-4	Mistral-8x22B	Qwen2-72B-Instruct	GPT-4	Mistral-8x22B	Qwen2-72B-Instruct	GPT-4	Mistral-8x22B	Qwen2-72B-Instruct	GPT-4	Mistral-8x22B	Qwen2-72B-Instruct
Chinese	0%	13%	0%	0%	17%	0%	43%	47%	53%	0%	10%	0%
English	0%	7%	0%	0%	23%	0%	37%	67%	63%	0%	27%	3%
Spanish	0%	13%	0%	0%	7%	0%	15%	26%	15%	3%	13%	0%
Portuguese	0%	7%	0%	3%	0%	0%	48%	64%	50%	3%	7%	3%
French	0%	3%	0%	3%	3%	7%	3%	19%	7%	0%	27%	0%
Korean	0%	4%	0%	3%	8%	4%	17%	29%	10%	0%	26%	4%
Japanese	0%	7%	0%	3%	7%	3%	47%	57%	47%	4%	26%	4%
Russian	0%	10%	0%	7%	23%	3%	13%	17%	10%	13%	7%	7%
Arabic	0%	4%	0%	4%	11%	0%	22%	26%	22%	0%	0%	0%
Average	0%	8%	0%	3%	11%	2%	27%	39%	31%	3%

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Alibaba's Strongest Open Source Model Released: Tongyi Qianwen Unveils Qwen2

aibase

Model Information

Performance

Emphasis

Coding and Mathematics

Long Context Understanding

Safety and Responsibility

This article is from AIbase Daily

AI News Recommendations

Tencent WeChat AI Team Launches New Diffusion Language Model WeDLM to Improve Reasoning Efficiency

BaiChuan Intelligence Launches Baichuan-M3: Open-Source Medical Large Model Tops Evaluations, Surpassing GPT-5.2 and Human Doctors

AI Can Finally Manipulate Things! Vercel Launches Agent Browser to Let Large Models Control Websites

The New Favorite in the AI Era! DINQ Creates an Intelligent Professional Business Card with One Click, Revolutionizing the Traditional Resume Model

Why Do AI Large Models Fail to Adapt in Grassroots Hospitals? Exploring the Reasons Behind the Issue!

Huatu Shanding AI Breakthrough in Civil Service Exam Essay Grading: Accurate Scoring in 2 Minutes, OMO Model Redefining the Education and Training Experience

Guangyun Technology Clarifies AI Business: No Self-Developed Large Model, Related Revenue is Small, Future Contribution is Uncertain

Yuedao Collaborates with Shengshu Technology: The AIGC Film and Television Ecosystem Begins to Take Shape - Vidu Large Model Empowers IP Visualization, Talent Education is Also Implemented

2 Months Generates 1 Billion Images! Google Nano Banana Pro Becomes a Global Sensation with Studio-Level Image Quality

Hong Kong Stock Market AI Concept Stocks Surge Against the Trend: Meitu's Market Capitalization Exceeds 40 Billion Hong Kong Dollars

AI News Recommendations

Tencent WeChat AI Team Launches New Diffusion Language Model WeDLM to Improve Reasoning Efficiency

BaiChuan Intelligence Launches Baichuan-M3: Open-Source Medical Large Model Tops Evaluations, Surpassing GPT-5.2 and Human Doctors

AI Can Finally Manipulate Things! Vercel Launches Agent Browser to Let Large Models Control Websites

The New Favorite in the AI Era! DINQ Creates an Intelligent Professional Business Card with One Click, Revolutionizing the Traditional Resume Model

Why Do AI Large Models Fail to Adapt in Grassroots Hospitals? Exploring the Reasons Behind the Issue!

Huatu Shanding AI Breakthrough in Civil Service Exam Essay Grading: Accurate Scoring in 2 Minutes, OMO Model Redefining the Education and Training Experience

Guangyun Technology Clarifies AI Business: No Self-Developed Large Model, Related Revenue is Small, Future Contribution is Uncertain

Yuedao Collaborates with Shengshu Technology: The AIGC Film and Television Ecosystem Begins to Take Shape - Vidu Large Model Empowers IP Visualization, Talent Education is Also Implemented

2 Months Generates 1 Billion Images! Google Nano Banana Pro Becomes a Global Sensation with Studio-Level Image Quality

Hong Kong Stock Market AI Concept Stocks Surge Against the Trend: Meitu's Market Capitalization Exceeds 40 Billion Hong Kong Dollars

GEO Services