Volc Engine's official Weibo announced today the official release of Doubao Large Model 1.6-vision. According to the introduction, Doubao Large Model 1.6-vision is the first visual deep thinking model in the Doubao Large Model family with tool calling capabilities. It has stronger general multimodal understanding and reasoning capabilities and supports Responses API.
Doubao Large Model 1.6-vision includes three main advantages, including:
Calling tools for more accurate visual understanding. Using the differentiated capability of tool calling, it integrates images into its thought chain, enabling precise processing of images such as positioning, cropping, selecting points, drawing lines, scaling, and rotating. By simulating the human visual reasoning process from "global scanning to local focus," it enhances the interpretability of reasoning while efficiently and accurately completing image operations.
More efficient application development. Supports Responses API, allowing developers to independently choose which tools to call, significantly reducing the amount of code in the Agent development process and improving development efficiency, making application development more efficient for developers.
Higher model cost-effectiveness. Compared to the previous version of the visual understanding model Doubao-1.5-thinking-vision-pro, the overall cost is reduced by about 50%, unlocking stronger performance at a lower cost, and the cost-effectiveness has been further improved!