AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
Datasets
EN

AI News

View More

MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate

The MOSS team from Fudan University released MOSS-Speech, which realizes end-to-end speech dialogue for the first time. The model is now available and open-sourced on Hugging Face. It adopts a 'layer splitting' architecture, freezing the original text model and adding new layers for speech understanding, semantic alignment, and vocoder. It can complete speech Q&A, emotional imitation, and laughter generation in one step, without the traditional three-step process. Evaluation results show that the word error rate has been reduced to 4.1% in the ZeroSpeech2025 task, and the emotion recognition accuracy reached 91.2%.

8.6k 34 minutes ago
MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map