On September 30, Qwen released the Qwen3-LiveTranslate-Flash, a multilingual real-time audio and video translation system based on large language models, bringing a revolutionary breakthrough to cross-language communication.

The system supports offline and real-time translation in 18 languages, covering mainstream languages such as Chinese, English, French, German, Russian, and Spanish, as well as various dialects such as Mandarin, Cantonese, Beijing dialect, and Wu dialect, providing comprehensive language support for international communication.

QQ20250930-161908.png

The core innovation of Qwen3-LiveTranslate-Flash lies in visual context enhancement technology. The system not only "understands" language but also "comprehends" the context by recognizing mouth movements, actions, text, and entities through multi-modal information, effectively improving translation accuracy in noisy environments and complex contexts, solving translation challenges such as multiple meanings of words.

In terms of delay control, the system adopts a lightweight mixture of experts architecture and dynamic sampling strategy, achieving a minimum of just 3 seconds of simultaneous interpretation delay, significantly improving the smoothness of real-time translation. Through semantic unit prediction technology, the system also alleviates word order issues in cross-language translation, ensuring high-quality output close to offline translation.

QQ20250930-161921.png

Test data show that Qwen3-LiveTranslate-Flash significantly outperforms mainstream models such as Gemini-2.5-Flash, GPT-4o-Audio-Preview, and Voxtral Small-24B in translation accuracy for Chinese-English and multilingual tasks, performing exceptionally well across multiple fields and complex acoustic environments.

QQ20250930-161929.png