AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

Qwen Launches CoGenAV Multimodal Speech Representation Model with Synchronized Perception of Audio and Visual

Recently, Qwen released CoGenAV, innovating speech recognition technology with the concept of audio-visual synchronization. It effectively addresses the challenge of noise interference in speech recognition. Traditional speech recognition performs poorly in noisy environments, while CoGenAV takes a different approach by learning the temporal alignment relationships among audio-visual-text, building a more robust and generalizable speech representation framework, systematically improving tasks such as speech recognition (VSR/AVSR), speech reconstruction (AVSS/AVSE), and audio-visual synchronization (A').

9.4k 3 days ago
Qwen Launches CoGenAV Multimodal Speech Representation Model with Synchronized Perception of Audio and Visual
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map