Alibaba Tongyi large model announced that its "Bailin" series speech models have undergone a major upgrade and are now officially open-sourced. The two updated speech models can seamlessly switch to up to nine languages and eighteen dialects, including Mandarin, Cantonese, Japanese, and English, after just three seconds of audio recording. They can also simulate various emotions such as happiness and anger.

In this upgrade, the Fun-CosyVoice3 model has seen significant improvements. The first packet delay has been reduced by 50%, greatly improving the accuracy of bilingual Chinese-English speech. In addition, the model's voice cloning capability has been enhanced, allowing users to replicate a corresponding voice and synthesize new speech with just a three-second or longer audio clip. This feature makes scenarios such as real-time voice assistants, live streaming dubbing, and accessibility reading more efficient and convenient.

image.png

The capabilities of the Fun-ASR model have also been improved, achieving an accuracy rate of 93% in noisy environments. This model not only supports the recognition of lyrics and rap but also enables free mixing of multiple languages, covering various Chinese dialects and accents. To enhance user experience, the first character delay in streaming recognition has been reduced to 160 milliseconds, significantly improving the fluency of voice interaction.

Additionally, both models support local deployment and secondary development, allowing developers to customize them according to their needs. The open-source address has also been published, and users can visit relevant platforms to experience and use these two speech models, further promoting the application of voice technology in various fields.

GitHub:https://github.com/FunAudioLLM/CosyVoice

Key points:  

🌐 ** Multilingual Support **: Switch between nine languages and eighteen dialects with just three seconds of audio.  

⚙️ ** Technology Upgrade **: Delay reduced by 50%, accuracy improved, making voice interaction smoother.  

📦 ** Open Source **: The model supports local deployment and secondary development, making it easy for personalized applications.