Nvidia has recently officially launched a new small language model, Nemotron-Nano-9B-v2, marking a resurgence in the popularity of small models.

image.png

The model has 900 million parameters. Although it is larger in scale compared to other small models with millions of parameters, it has been significantly reduced from the original 1.2 billion parameters, aiming to run efficiently on a single Nvidia A10 GPU. Oleksii Kuchiaev, Director of AI Model Post-training at Nvidia, stated on social platforms that reducing the number of parameters is intended to better adapt to deployment needs. The model uses a hybrid architecture, which can be six times faster than similar-sized transformer models when processing larger batches.

Nemotron-Nano-9B-v2 supports multiple languages, including English, German, Spanish, French, Italian, Japanese, and others, and is suitable for tasks such as instruction following and code generation. The model also includes an innovative feature — users can switch the AI's "reasoning" process by using simple control tokens, i.e., self-checking before providing an answer. By default, the system generates reasoning traces, but users can control this process using commands like /think or /no_think. Additionally, the model introduces a "thinking budget" management mechanism, allowing developers to set the number of tokens used during the reasoning process, balancing accuracy and response speed.

image.png

According to test results, Nemotron-Nano-9B-v2 performed well in multiple benchmark tests. In "reasoning enabled" mode, the model achieved satisfactory results in tests such as AIME25, MATH500, GPQA, and LiveCodeBench. Moreover, it also showed excellent performance in instruction following and long context benchmarks, demonstrating higher accuracy compared to other open-source small models.

Nvidia has set an open licensing agreement for this model, allowing developers to freely use and distribute it commercially, and explicitly stating that it does not claim ownership of the generated output. This means companies can immediately put the model into production without additional negotiations, without worrying about usage barriers or fees.

Nvidia's Nemotron-Nano-9B-v2 model provides developers with a new tool to achieve reasoning capabilities and efficient deployment on a small scale. Its running budget control and reasoning switching features offer flexibility for system builders, aiming to improve accuracy and response speed, further advancing the development of small language models.

Key Points:

🌟 Nemotron-Nano-9B-v2 is a new small language model introduced by Nvidia, with 900 million parameters, specifically designed for efficient deployment.

🧠 The model supports multiple languages and has a reasoning switching feature, helping users adjust their responses according to their needs.

📈 An open licensing agreement allows developers to freely use and distribute the model without worrying about additional costs or licensing agreements.