Volc Engine Launches Doubao Speech Recognition Model 2.0 to Improve Multilingual Recognition Accuracy

AIbase基地

Published inAI News · 4 min read · Dec 5, 2025

Volcano Engine has officially launched the Doubao Speech Recognition Model 2.0 (Doubao-Seed-ASR-2.0). This upgraded version of speech recognition technology not only achieves significant improvements in inference capabilities, but also supports accurate recognition of multiple languages and visual information, marking another major advancement in speech recognition technology.

According to reports, the Doubao Speech Recognition Model 2.0 builds on the advantages of the previous version's high-performance audio encoder with 2 billion parameters, focusing on optimization in complex scenarios. The model conducts deep learning on challenging elements such as proper nouns, names, place names, and homophones, aiming to provide higher accuracy in various application scenarios. Its inference capabilities are based on an advanced PPO scheme, enabling precise recognition through deep understanding of context without relying on historical records of target words.

Notably, the upgrade of the Doubao Speech Recognition Model 2.0 enables it to have multimodal understanding capabilities, allowing it to analyze both text and visual information simultaneously. This means that after users send images, the model can combine image content for speech recognition, thus more accurately understanding user intent. For example, when a user describes an image containing a skateboard, traditional models might mistakenly recognize "slid chicken" as "funny," while the Doubao model can determine from the image analysis that the correct term is indeed "slid chicken," avoiding recognition errors.

In addition, the Doubao Speech Recognition Model 2.0 supports accurate recognition of 13 overseas languages, including Japanese, Korean, German, and French. This multilingual support will effectively expand its use in cross-language application scenarios, enhancing the interaction experience for global users.

Volcano Engine stated that the Doubao Speech Recognition Model 2.0 is now available at the Volcano Fangzhou Experience Center and provides API services for external access, allowing enterprises and developers to conveniently integrate this technology. In the future, Volcano Engine will continue to drive the evolution of the model, striving to achieve more accurate voice-to-text services in multimodal and multi-scenario environments, providing efficient solutions for users.

The release of the Doubao Speech Recognition Model 2.0 by Volcano Engine fully demonstrates its continuous innovation capabilities and technical strength in the field of artificial intelligence, and is expected to have a positive impact on industry standards and user experiences.

Doubao Speech Recognition Model 2.0 Volc Engine PPO Solution Polyphone Recognition

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meta Tests AI Shopping Features to Compete with ChatGPT and Google

Meta is upgrading its AI assistant into an e-commerce tool, secretly testing the 'AI Shopping Research' feature, directly targeting ChatGPT and Gemini. This feature is now available to some U.S. users, offering intuitive product carousel recommendations, aiming to capture the AI consumer interface.

Mar 3, 2026

110

AI Daily: MiniMax Releases Its First Financial Report After Listing; Qwen3.5 Small Model Series of Tongyi Open-Source; Claude Code Official Voice Mode Launches

Welcome to the [AI Daily] column! Here is your guide to exploring the world of artificial intelligence every day. Every day, we bring you the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications. Discover new AI products: https://app.aibase.com/zh1. MiniMax Releases Its First Financial Report After Listing MiniMax has released its first annual financial report after listing, showcasing significant progress and financial performance in its AI platform strategy. 8. DeepS

Mar 3, 2026

100

DeepSeek V4 Lite Evolves Stealthily: A 200 Billion-Parameter Small Model with Impressive Performance, Approaching Top Overseas Models

As a pre-release version of V4, DeepSeek V4 Lite has attracted attention with 200 billion parameters and a context length of up to 1 million tokens. After continuous upgrades, its performance is comparable to top closed-source models, showing outstanding results in various benchmark tests and demonstrating strong competitiveness.

Mar 3, 2026

160

QM Releases 2025 AI Application Ranking: Doubao, DeepSeek, Yuanbao, Afu, and Qianwen Rank in Top 5

QuestMobile report shows that as of December 2025, the top five AI applications by monthly active users are Doubao, DeepSeek, Yuanbao, Ant Afu, and Alibaba Qianwen. Ant Lingguang enters the top ten. The report indicates that AI applications are shifting from "general coverage" to "scenario penetration", with six of the top ten applications being general AI and four being specialized vertical AI.

Mar 3, 2026

KFC Collaborates with Alibaba Qwen Large Model to Launch AI Ordering Assistant Xiaok, Supporting Full-Process Voice Closed Loop

KFC introduces AI ordering assistant 'Xiao K', powered by Alibaba's Tongyi Qianwen model and RAG technology, enabling natural language understanding and multi-turn dialogue. Users can input needs like '10-person meeting, budget 350 yuan' for smart meal recommendations, streamlining ordering and enhancing experience.....

Mar 3, 2026

130

GPT-5.4 Unexpectedly Shocks! GitHub Source Code Leaked, OpenAI's Secret Weapon: 2 Million Long Contexts + Stateful AI Completely Ends the Goldfish Memory Era?

An OpenAI engineer accidentally leaked information about the unreleased GPT-5.4 model in a code repository, causing a stir in the tech community. Although the company quickly corrected it to "gpt-5.3-codex", many believe this was not a simple mistake, suggesting that a major update may be coming in the field of large models.

Mar 3, 2026

130

Borderless Communication! iFLYTEK AI Glasses Make Their Debut at MWC 2026: 40g Ultra-lightweight, Lip-Movement Recognition Technology Helps Achieve Superior Translation in Noisy Environments

iFLYTEK introduced the "iFLYTEK AI Glasses" at MWC 2026, designed specifically for face-to-face communication. Using multimodal technology, it solves the problems of traditional translation devices in complex environments such as unclear audio and inaccurate translation, achieving an instant translation effect that matches what you see, making communication more natural.

Mar 3, 2026

Alibaba TONGYI Qwen Open Source Qwen3.5 Small Model Series: Multimodal Agent Can Run on Edge Devices

The Alibaba TONGYI Qwen team has launched the Qwen3.5 small model series, including four lightweight models of 0.8B, 2B, 4B, and 9B, along with their corresponding base versions. They are based on a unified architecture, equipped with native multimodal capabilities (supporting image-text processing), with structural improvements and reinforcement learning training that can be scaled, achieving higher intelligence levels with fewer computing resources. Among them, the 0.8B and 2B models are extremely compact and fast in inference, specifically optimized for edge devices.

Mar 3, 2026

130

iFLYTEK AI Glasses Make Debut at MWC 2026: 40-Gram Body Achieves Multimodal Real-Time Translation

iFLYTEK launched AI glasses at MWC 2026, emphasizing lightweight design and multimodal interaction. Its core function is real-time cross-language translation, achieving an "instant visual-to-visual" communication experience by displaying subtitles on the lenses and playing audio through speakers.

Mar 3, 2026

DeepSeek Large Model V4 is About to Launch, Bringing New Opportunities for AI Applications!

DeepSeek's V4 model, launching next week, adds image, video, and text generation, marking its first major upgrade since January 2025's R1 model, expanding in China's low-cost open-source AI market. Analysts note this may accelerate AI commercialization, especially during the Spring Festival via high-frequency consumer scenarios.....

Mar 3, 2026

150

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Volc Engine Launches Doubao Speech Recognition Model 2.0 to Improve Multilingual Recognition Accuracy

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Meta Tests AI Shopping Features to Compete with ChatGPT and Google

AI Daily: MiniMax Releases Its First Financial Report After Listing; Qwen3.5 Small Model Series of Tongyi Open-Source; Claude Code Official Voice Mode Launches

DeepSeek V4 Lite Evolves Stealthily: A 200 Billion-Parameter Small Model with Impressive Performance, Approaching Top Overseas Models

QM Releases 2025 AI Application Ranking: Doubao, DeepSeek, Yuanbao, Afu, and Qianwen Rank in Top 5

KFC Collaborates with Alibaba Qwen Large Model to Launch AI Ordering Assistant Xiaok, Supporting Full-Process Voice Closed Loop

GPT-5.4 Unexpectedly Shocks! GitHub Source Code Leaked, OpenAI's Secret Weapon: 2 Million Long Contexts + Stateful AI Completely Ends the Goldfish Memory Era?

Borderless Communication! iFLYTEK AI Glasses Make Their Debut at MWC 2026: 40g Ultra-lightweight, Lip-Movement Recognition Technology Helps Achieve Superior Translation in Noisy Environments

Alibaba TONGYI Qwen Open Source Qwen3.5 Small Model Series: Multimodal Agent Can Run on Edge Devices

iFLYTEK AI Glasses Make Debut at MWC 2026: 40-Gram Body Achieves Multimodal Real-Time Translation

DeepSeek Large Model V4 is About to Launch, Bringing New Opportunities for AI Applications!

AI News Recommendations

Meta Tests AI Shopping Features to Compete with ChatGPT and Google

AI Daily: MiniMax Releases Its First Financial Report After Listing; Qwen3.5 Small Model Series of Tongyi Open-Source; Claude Code Official Voice Mode Launches

DeepSeek V4 Lite Evolves Stealthily: A 200 Billion-Parameter Small Model with Impressive Performance, Approaching Top Overseas Models

QM Releases 2025 AI Application Ranking: Doubao, DeepSeek, Yuanbao, Afu, and Qianwen Rank in Top 5

KFC Collaborates with Alibaba Qwen Large Model to Launch AI Ordering Assistant Xiaok, Supporting Full-Process Voice Closed Loop

GPT-5.4 Unexpectedly Shocks! GitHub Source Code Leaked, OpenAI's Secret Weapon: 2 Million Long Contexts + Stateful AI Completely Ends the Goldfish Memory Era?

Borderless Communication! iFLYTEK AI Glasses Make Their Debut at MWC 2026: 40g Ultra-lightweight, Lip-Movement Recognition Technology Helps Achieve Superior Translation in Noisy Environments

Alibaba TONGYI Qwen Open Source Qwen3.5 Small Model Series: Multimodal Agent Can Run on Edge Devices

iFLYTEK AI Glasses Make Debut at MWC 2026: 40-Gram Body Achieves Multimodal Real-Time Translation

DeepSeek Large Model V4 is About to Launch, Bringing New Opportunities for AI Applications!

GEO Services