A new breakthrough has been achieved in open-source text-to-speech (TTS) technology! The recently released Muyan-TTS, an open-source TTS model designed specifically for scenarios like podcasts, audiobooks, and long-form videos, boasts zero-shot voice synthesis, ultra-fast generation, and high-coherence reading capabilities. It is currently one of the most suitable models for batch-generating long audio content.
Muyan-TTS is pre-trained on over 100,000 hours of podcast data. It can generate one second of high-quality audio in just 0.33 seconds, supporting seamless reading of several minutes of text with natural and smooth speech. Additionally, it offers speaker customization, allowing for cloning of any voice to generate personalized content with unique tones and rhythms at the click of a button.
The model has been made available on Hugging Face and supports offline deployment. Developers can easily perform local inference and adapt it to various applications such as podcast creation, audiobook production, English video dubbing, AI character narration, smart speaker announcements, and more, significantly boosting content production efficiency.
Interested developers can visit Hugging Face to obtain model weights and sample code to embark on your AI voice creation journey.
GitHub open-source address: https://github.com/MYZY-AI/Muyan-TTS
HF model address: https://huggingface.co/MYZY-AI/Muyan-TTS