UniMuMo
Unified model for text, music, and motion generation.
CommonProductMusicArtificial IntelligenceMachine Learning
UniMuMo is a multimodal model capable of taking any text, music, and motion data as input conditions to generate outputs across all three modalities. The model bridges these modalities by converting music, motion, and text into token-based representations through a unified encoder-decoder architecture. By fine-tuning existing pretrained unimodal models, it significantly reduces computational requirements. UniMuMo has achieved competitive results in all unidirectional generation benchmarks across music, motion, and text modalities.
UniMuMo Visit Over Time
Monthly Visits
No Data
Bounce Rate
No Data
Page per Visit
No Data
Visit Duration
No Data
UniMuMo Visit Trend
No Visits Data
UniMuMo Visit Geography
No Geography Data
UniMuMo Traffic Sources
No Traffic Sources Data