NVIDIA officially released Llama Nemotron Nano VL on June 3, 2025. This compact visual-language model (VLM) is specially optimized for document intelligence processing. The model ranks first on the OCRBench v2 benchmark, showcasing its outstanding capabilities in handling complex documents, charts, and video frames. With efficient inference performance and flexible deployment methods, Llama Nemotron Nano VL provides high-precision document processing solutions from cloud to edge devices for enterprises.

image.png

Llama Nemotron Nano VL: A Compact and Efficient Document Processing Tool

Llama Nemotron Nano VL is based on Meta’s Llama3.1 architecture, combined with a lightweight visual encoder CRadioV2-H, featuring a parameter size of only 8B while performing excellently in document understanding tasks. The model supports multi-modal inputs, covering complex scenarios such as multi-page documents, scanned tables, financial reports, and technical charts, with a context length of up to 16K tokens, suitable for long document processing and multi-hop reasoning tasks.

Its core advantage lies in its efficient inference performance. By using the AWQ4bit quantization technology, the model can run on a single NVIDIA RTX GPU or Jetson Orin edge device, significantly reducing deployment costs. This makes Llama Nemotron Nano VL an ideal choice for enterprises that need to run AI agents in resource-constrained environments.

OCRBench v2 Top Ranker: Leading Document Parsing Capability

Llama Nemotron Nano VL achieved the highest score on the OCRBench v2 benchmark, surpassing other compact visual-language models. OCRBench v2 includes over 10,000 human-verified question-and-answer pairs, covering documents in fields such as finance, healthcare, law, and scientific publishing. The test content includes optical character recognition (OCR), table parsing, and chart reasoning.

image.png

The model excels in extracting structured data (such as tables and key-value pairs) and answering layout-based questions, demonstrating strong robustness in non-English documents and low-quality scanning scenarios. Its high precision and generalization ability make it highly applicable in automated document question-answering, intelligent OCR, and information extraction scenarios.

Flexible Deployment: Empowering Enterprises Across Multiple Scenarios

Llama Nemotron Nano VL supports flexible deployment from data centers to edge devices, compatible with NVIDIA's TensorRT-LLM framework to ensure efficient operation on GPU-accelerated systems. Enterprises can customize it through NVIDIA NeMo microservices to meet specific domain needs, such as financial analysis, medical record processing, or legal document review.

In addition, the model supports single-image and video inference, suitable for tasks such as image summarization, text-image analysis, and interactive question-answering. Its open-source nature (under the NVIDIA Open Model License and Llama3.1 Community License) allows commercial use, providing developers with the freedom to build customized AI agents.

NVIDIA's Strategic Layout in Intelligent Agent Domain

Llama Nemotron Nano VL is an important part of NVIDIA’s Nemotron model family, reflecting its continuous investment in the intelligent agent (Agentic AI) field. By combining the Llama architecture and NVIDIA’s optimization technologies, this model not only improves inference efficiency but also sets a new benchmark in the document processing field.

NVIDIA also plans to further expand the model functions through the NeMo framework and NIM microservices, supporting more multi-modal tasks such as video search and physics-aware video generation. This indicates that NVIDIA is committed to building a comprehensive AI ecosystem spanning from edge to cloud, providing strong support for enterprise digital transformation.

The release of Llama Nemotron Nano VL marks a new breakthrough in the enterprise-level application of compact visual-language models. Its efficiency and high precision open up new possibilities for automated document processing, knowledge management, and intelligent collaboration. AIbase will continue to track NVIDIA's latest developments in the AI field, providing readers with cutting-edge technological insights.

Access: https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1