Xiaomi Launches Self-Developed MiMo-V2-TTS Text-to-Speech Large Model, Achieving Deep Control of Multiple Dialects and Emotions
Xiaomi unveils MiMo-V2-TTS, a self-developed TTS model that advances controllable, expressive speech synthesis. Built on a custom Audio Tokenizer and multi-codebook architecture, it enables precise macro-to-micro emotional adjustments via large-scale pre-training. The model achieves natural human-like prosody, supports diverse vocal styles, and handles emotional transitions within single sentences.....