Recently, with the continuous development of global speech recognition technology, Qwen has officially launched its latest speech recognition model - Qwen3-ASR-Flash. This model is based on the Qwen3 foundation model and has been trained on massive multimodal data and tens of millions of hours of automatic speech recognition (ASR) data, aiming to provide users with a high-precision and robust speech recognition solution.

QQ20250909-085515.png

The core features of Qwen3-ASR-Flash include leading recognition accuracy and impressive singing recognition capabilities. The model performs excellently in multiple benchmarks for Chinese, English, and multilingual tasks, especially in supporting singing recognition, with an actual test error rate below 8%. This means that whether it's a solo or a complete song with background music, Qwen3-ASR-Flash can effectively recognize and transcribe it.

Another significant feature is its customizable recognition ability. Users can provide text context in any format, and the model can intelligently identify and match named entities and key terms, thereby outputting personalized recognition results. This functionality makes Qwen3-ASR-Flash more flexible and adaptable when handling complex contexts.

In addition, Qwen3-ASR-Flash supports up to 11 languages and various dialects and accents, enabling accurate transcription. Its language support includes Mandarin and major dialects (such as Sichuan dialect, Cantonese, etc.), British and American English, and even French, German, Russian, Italian, Spanish, Japanese, Korean, and Arabic. This provides users with broader choices, meeting the needs of users from different regions and languages.

QQ20250909-085525.png

Qwen3-ASR-Flash also has strong robustness, maintaining high accuracy in long and difficult sentences, language switching within sentences, and complex acoustic environments. It effectively filters out non-speech segments such as silence and background noise, ensuring users get the best speech recognition experience.

To allow users to experience the powerful functions of Qwen3-ASR-Flash, Qwen provides access on multiple platforms, including ModelScope, HuggingFace, and Alibaba Cloud BaiLian API, allowing users to conveniently try the model.

Looking ahead, Qwen stated that Qwen3-ASR-Flash will continue to iterate and upgrade, improving recognition accuracy and developing more features, committed to providing users with smarter and more efficient speech-to-text services. Through this technological innovation, Qwen hopes to open up a broader future in the field of speech recognition.