On July 30, during the FORCE Link AI Innovation Tour - Xiamen Station event hosted by ByteDance Engine, it released new models in the Doubao series and upgraded AI cloud-native service achievements, including the Doubao Image Editing Model 3.0, the Simultaneous Interpretation Model 2.0, and the newly upgraded Doubao Large Model 1.6 series. It also launched open-source core capabilities of Coze, enterprise self-hosted model solutions, and other tools to provide full-stack support for enterprises and developers to build Agents and implement AI applications.
Figure: Zhang Tan, President of ByteDance Engine, announced the latest Doubao models
The New Models in the Doubao Series Are Now Available to Enterprises
To address pain points in AI image editing such as "not understanding instructions, modifying content incorrectly, and poor generation results," ByteDance Engine introduced the Doubao Image Editing Model 3.0 (SeedEdit3.0). This model enhances instruction following, image preservation, and generation quality, allowing users to complete operations such as removing redundant elements, adjusting lighting, and replacing components simply through natural language. It also enables innovative photo editing scenarios such as style conversion, material transformation, and posture adjustment. The model is widely applicable to fields like image creation and advertising marketing. Enterprise users can call its API via Volcano Ark, while individual users can experience it through Ji Meng or the Doubao app.
The newly released Doubao Simultaneous Interpretation Model 2.0 (Seed-LiveInterpret2.0) breaks through the limitations of traditional "cascaded models" by adopting a full-duplex framework that reduces speech delay from 8-10 seconds to 2-3 seconds, enabling real-time text and voice synchronization. It also supports zero-sample voice replication, generating identical voice tones without prior recordings, even matching regional accents, significantly enhancing the immersion of cross-language communication.
The Doubao Large Model 1.6 series has also been upgraded. Among them, the fast version Doubao-Seed-1.6-flash model maintains strong visual understanding capabilities while enhancing code, reasoning, and math skills, making it suitable for large-scale commercial scenarios such as intelligent inspection and mobile phone assistants. Its TPOT (Time to First Token) is as low as 10ms, leading the industry; in terms of cost, within the input text length range of 0-32k (most commonly used by enterprises), it costs only 0.15 yuan per million tokens input and 1.5 yuan for output. In customer use cases, it has achieved a 60% reduction in latency and a 70% decrease in cost.
Additionally, the fully multimodal vectorization model Seed1.6-Embedding has achieved the first "text + image + video" multimodal fusion search, helping enterprises build more powerful multimodal knowledge bases. It has secured the best performance in authoritative evaluations across multimodal comprehensive tasks and Chinese text.
Optimizing AI Cloud-Native Services to Accelerate Agent Development and Deployment
To help with end-to-end Agent development and deployment, ByteDance Engine continues to optimize its full-stack AI cloud-native services. On July 26, the core capabilities of the AI Agent development platform Coze were officially open-sourced, including the one-stop visual development tool "Coze Studio" and the full-lifecycle management tool "Coze Loop." They are available under the Apache 2.0 license, and users can download them on GitHub. Within three days of the open source release, the star count for Coze Studio exceeded 10,000, and Coze Loop exceeded 3,000 stars. ByteDance Engine provides full support, with enterprise AI platforms HiAgent able to call these capabilities, and cloud-based products supporting one-click deployment.
For enterprises requiring customized models, ByteDance Engine offers a self-hosted model solution based on the Volcano Ark model unit. Enterprises no longer need to manage underlying GPU resources or complex configurations, and can enjoy full托管 of their self-developed models, with elastic computing power, the ability to choose deployment methods and machine types, precise control over latency, and no payment for business low periods. It is currently open for invitation testing.
At the same time, Volcano Ark upgraded its API system and launched the Responses API. This API has native context management capabilities, supports multi-turn dialogue chain management, and connects multimodal data such as text and images. Combined with caching capabilities, it can reduce costs by 80%. It also supports linking multiple tools and model combinations in a single request, reducing smart assistant Agent development from 460 lines of code and 1-2 days to 60 lines of code and 1 hour, greatly improving efficiency.
This series of releases further completes ByteDance Engine's AI ecosystem layout, providing enterprises and developers with full-chain support from basic models to development tools, accelerating the application of AI in various industries.