Volcano Engine officially launched the Doubao Large Model 2.0 (Doubao-Seed-2.0) series, and simultaneously introduced API services for enterprises and developers. Individual users can experience it through the Volcano Ark Experience Center or the Doubao App's "Expert" mode.

This version has undergone systematic optimization to meet the needs of large-scale production environments. With capabilities such as efficient reasoning, multimodal understanding, and complex instruction execution, it can better handle real-world complex tasks. The reasoning cost is reduced by about one order of magnitude compared to industry-leading models, and the daily Tokens usage has increased more than 500 times since its initial release.

image.png

The Doubao Large Model 2.0 offers four differentiated models, adapted to different scenarios of latency and cost requirements: the Pro version, as the flagship model, focuses on complex deep reasoning and Agent tasks; the Lite version outperforms version 1.8, with improved capabilities and fewer Tokens consumption, offering excellent cost-effectiveness; the Mini version prioritizes speed and cost, with capabilities comparable to version 1.6 Pro; the Code version is optimized for developers, suitable for real programming environments, and performs even better when used with TRAE.

This update has achieved a comprehensive upgrade in multimodal understanding capabilities, reaching world-class levels in visual understanding. The Pro version leads Gemini3pro in evaluations such as spatial understanding MMSIBench, motion understanding MotionBench, and video understanding VideoMME, and its chart understanding CharXiv-RQ capability has also significantly improved.

Regarding video scenarios, the model has enhanced time series and motion perception understanding, leading in key evaluations such as TVBench, with EgoTempo benchmark scores exceeding humans. In long video evaluations, it surpasses most top models, enabling real-time video stream analysis, active guidance, and other interactions. It is suitable for companionship scenarios such as fitness and fashion, and can accurately infer billiard movements, identify sports actions, and provide professional guidance.

The model's LLM and Agent capabilities have also been significantly enhanced. By adding long-tail domain knowledge, it better adapts to professional scenarios: the Pro version scores higher than GPT5.2 in SuperGPQA, ranks first in HealthBench, and its performance in scientific fields matches that of Gemini3Pro and GPT5.2; HLE-text leads globally with 54.2 points, IMO evaluation surpasses Gemini3pro, and shows excellent performance in tool calling and instruction following. In some scenarios, STEM benchmark scores exceed Gemini3Pro.

At the same time, the model has enhanced consistency and controllability in instruction following, excelling at long-chain multi-step tasks. It can complete continuous workflows such as "finding information - summarizing - drawing conclusions," and combine tools to complete full-process tasks from data processing, content creation to image generation and layout. Intelligent customer service agents built based on it can achieve full-cycle services including customer conversations, issue transfer, and after-sales follow-up. Additionally, the Code version model can stably call mainstream IDE tools, with significant optimizations in front-end capabilities, supporting custom skills. When combined with TRAE, it can greatly improve development efficiency, requiring only five prompt rounds to build complex web applications like "AI Temple Fair," with related materials already open-sourced.

To address the surge in Tokens usage in the Agent era, Volcano Engine has also updated the Coding Plan package. Developers can call this model through Volcano Ark. New users can use it for as low as 8 yuan in the first month, achieving precise model matching for different programming tasks.