The Alibaba Artificial Intelligence department officially launched the compact version of the Qwen3-VL vision-language model series today, introducing variants with 4 billion and 8 billion parameters. This move marks a significant leap in the widespread application of advanced multimodal AI technology to edge devices and resource-constrained environments.

Performance Leap, Small Models Compete with Giants

The newly released 4B and 8B models come with Instruct and Thinking versions, and are optimized for core multimodal capabilities such as STEM reasoning, visual question answering (VQA), optical character recognition (OCR), video understanding, and agent tasks.

According to published benchmark test results, these small models perform exceptionally well across multiple categories, surpassing competitors such as Gemini 2.5 Flash Lite and GPT-5 Nano. More notably, their performance in certain areas can even match that of the larger Qwen2.5-VL-72B model released just six months ago, demonstrating high parameter efficiency.

QQ20251015-103538.png

Resource Optimization, Promoting AI Democratization

The key highlight of the new model is significantly reduced VRAM usage, allowing it to run directly on consumer-grade hardware such as laptops and smartphones. To further improve efficiency, Alibaba also provides an FP8 quantized version, further reducing resource consumption without sacrificing core capabilities. As one of the Qwen team members involved in the development said: "Small VL models are suitable for deployment and have significant implications in mobile and robotics fields."

Fast Iteration, Open Source Sharing

The release of this compact model continues the roadmap of the Qwen3-VL series launched in September (with a flagship model of 235 billion parameters). Previously, Alibaba had released the 30B-A3B variant in early October, achieving benchmark test results comparable to GPT-5Mini and Claude4Sonnet with only 30 billion active parameters. This rapid iteration is seen by the industry as a strong demonstration of Alibaba's efforts to promote the democratization of high-performance AI, especially suitable for embodied systems like robots.

Address:

https://huggingface.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe

https://github.com/QwenLM/Qwen3-VL/tree/main/cookbooks