Baidu Launches Qianfan-VL Model, Multiple Sizes Meet Different Scenario Needs

AIbase基地

Published inAI News · 4 min read · Sep 23, 2025

Baidu Intelligent Cloud Qianfan team officially launched the new visual understanding model — Qianfan-VL, and fully open-sourced it! This series of models includes three different sizes: 3B, 8B, and 70B, aiming to meet the needs of enterprise-level multimodal applications. After deep optimization, it demonstrates strong visual understanding capabilities.

The Qianfan-VL model not only has excellent basic capabilities but also has been specially improved for high-frequency industry needs, such as optical character recognition (OCR) and educational scenarios, making it perform even better in practical use. The model is developed based on open-source models and completed the entire workflow computing on Baidu's self-developed Kunlun Chip P800. The powerful computing power ensures that the model can efficiently process complex data and algorithms.

This new model has three significant features. First, the multi-size selection allows enterprises and developers of different scales to find suitable solutions, with three specifications of 3B, 8B, and 70B to meet various application needs. Second, the 8B and 70B models have thinking and reasoning capabilities, which can handle complex chart understanding, visual reasoning, and math problem-solving tasks through special tokens. Finally, it performs exceptionally well in OCR and document understanding, not only accurately recognizing handwritten text and complex layouts but also performing structured information extraction.

In benchmark tests, the Qianfan-VL series model demonstrated outstanding general capabilities and excellent performance in specific tasks. Whether it's visual understanding or professional field Q&A, this model shows impressive accuracy and performance in all tests. Especially in the fields of OCR and document understanding, its full-scenario recognition capabilities and complex document analysis abilities provide high-precision solutions for enterprise applications.

Additionally, the mathematical problem-solving capabilities of Qianfan-VL are worth mentioning. The 8B and 70B models show superior performance when handling complex reasoning tasks by combining visual information with external knowledge. In practical application scenarios, it can extract key information and perform data analysis, helping enterprises make intelligent decisions.

The launch of Qianfan-VL marks a major breakthrough for Baidu in the field of visual understanding. We look forward to its application across industries, which will trigger a new wave.

Rejecting Q&A: JD.com Open-Sources Real-Time Video Interaction Model JoyAI-VL-Interaction

JD.com open-sourced the world's first full-stack real-time video interaction model, JoyAI-VL-Interaction, with deep support from vLLM-Omni. It breaks the traditional passive response mode, enabling AI to actively 'watch and speak,' marking a shift from waiting for queries to autonomous observation and instant interaction.....

Power Base Breakthrough: Spark Multimodal Large Model X2-VL Officially Released

The competition in AI power computing has entered the deep waters of domestication. iFLYTEK released the Spark Multimodal Large Model X2-VL at the Wuxi Conference, which is currently the only model trained on fully domestic power computing. It features a specially designed architecture, marking an important breakthrough in technological iteration and self-reliance.

Baidu Wenyin launches PaddleOCR-VL-1.6: Accuracy exceeds 96.33% and sets a new SOTA for document parsing

Baidu releases the PaddleOCR-VL-1.6, a derivative model of the Wenyin large model, which achieves an accuracy of 96.33% in the OmniDocBench v1.6 evaluation, surpassing mainstream models such as Gemini-3-Pro and GPT-5.2, setting a new SOTA and ranking first in comprehensive performance globally. This model marks a significant breakthrough in multi-modal large models for complex document understanding and real-world scenario parsing, supporting recognition in over 100 languages with a wide user base.

Baidu's AI Cloud Revenue Reached 8.8 Billion in the First Quarter, Surging 79% - Kunlun Xiang P800 Delivers a Cluster of 10,000 Cards

Baidu reported Q1 2026 revenue of 32.1 billion yuan, with core business revenue of 26 billion yuan, up 2% year-on-year, exceeding expectations. AI business surged, with AI cloud revenue reaching 8.8 billion yuan, up 79%, and GPU cloud revenue skyrocketing 184%. Baidu AI Cloud has been upgraded to a full-stack AI cloud for large-scale agent applications, enhancing capabilities from underlying computing power to agent applications.....

Prevent Falsification of the Golden Body: OpenAI Secretly Amends Its Charter to Significantly Increase the Difficulty of Removing Altman

After the 2023 coup attempt, OpenAI amended its bylaws to significantly enhance CEO Sam Altman's job security, raising the threshold for his dismissal from a simple majority vote to make external interference or internal removal more difficult. These changes were quietly implemented during the company's transition to a for-profit model, as revealed by expert witnesses in Elon Musk's lawsuit.....

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

Tencent Hunyuan, in collaboration with UCLA and CUHK, has open-sourced a multimodal search agent to address the evolution of Multimodal Large Language Models (MLLMs) from passive understanding to active reasoning. Previously, the lack of high-quality data, automated trajectory synthesis paths, and training recipes hindered the reproduction of top-tier agents. This open-source initiative aims to break the deadlock and advance community development....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Baidu Launches Qianfan-VL Model, Multiple Sizes Meet Different Scenario Needs

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Rejecting Q&A: JD.com Open-Sources Real-Time Video Interaction Model JoyAI-VL-Interaction

Power Base Breakthrough: Spark Multimodal Large Model X2-VL Officially Released

Baidu Wenyin launches PaddleOCR-VL-1.6: Accuracy exceeds 96.33% and sets a new SOTA for document parsing

Baidu's AI Cloud Revenue Reached 8.8 Billion in the First Quarter, Surging 79% - Kunlun Xiang P800 Delivers a Cluster of 10,000 Cards

Prevent Falsification of the Golden Body: OpenAI Secretly Amends Its Charter to Significantly Increase the Difficulty of Removing Altman

OpenAI Launches Codex Chrome Extension to Enhance Browser Efficiency

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

3 Years, 20 Times! The AI-Native Game Trend Is Approaching, More Than Half of the Mainstream Developers Have Completed Technological Convergence

Claude Code New Feature Launch: Monitor Tool Released, Supports Real-Time Background Process Monitoring

Tencent Responds to Controversy Over Data Crawling by OpenClaw: Located as a Local Mirror and Has Alleviated 99% of the Traffic Pressure

AI News Recommendations

Rejecting Q&A: JD.com Open-Sources Real-Time Video Interaction Model JoyAI-VL-Interaction

Power Base Breakthrough: Spark Multimodal Large Model X2-VL Officially Released

Baidu Wenyin launches PaddleOCR-VL-1.6: Accuracy exceeds 96.33% and sets a new SOTA for document parsing

Baidu's AI Cloud Revenue Reached 8.8 Billion in the First Quarter, Surging 79% - Kunlun Xiang P800 Delivers a Cluster of 10,000 Cards

Prevent Falsification of the Golden Body: OpenAI Secretly Amends Its Charter to Significantly Increase the Difficulty of Removing Altman

OpenAI Launches Codex Chrome Extension to Enhance Browser Efficiency

Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent

3 Years, 20 Times! The AI-Native Game Trend Is Approaching, More Than Half of the Mainstream Developers Have Completed Technological Convergence

Claude Code New Feature Launch: Monitor Tool Released, Supports Real-Time Background Process Monitoring

Tencent Responds to Controversy Over Data Crawling by OpenClaw: Located as a Local Mirror and Has Alleviated 99% of the Traffic Pressure