CuMo
An advanced architecture for extending multimodal large language models (LLMs).
CommonProductProgrammingMultimodal LearningLarge Language Models
CuMo is an extension architecture for multimodal large language models (LLMs). It enhances model scalability by incorporating sparse Top-K gated expert-mixing (MoE) blocks within both the visual encoder and MLP connector, while adding virtually no activation parameters during inference. CuMo pre-trains MLP blocks and initializes experts within the MoE blocks, utilizing auxiliary loss during the visual instruction fine-tuning stage to ensure balanced expert loading. CuMo outperforms other similar models on various VQA and visual instruction following benchmarks, trained entirely on open-source datasets.
CuMo Visit Over Time
Monthly Visits
484
Bounce Rate
41.67%
Page per Visit
1.0
Visit Duration
00:00:00
CuMo Visit Trend
CuMo Visit Geography
CuMo Traffic Sources
CuMo Alternatives

Language Learning Games — AI text adventure games for language learning
•language learning•AI game
666

InternVL2_5-78B — Advanced multimodal large language model series
•Multimodal•Large Language Model
462