CuMo

An advanced architecture for extending multimodal large language models (LLMs).

CommonProductProgrammingMultimodal LearningLarge Language Models

CuMo is an extension architecture for multimodal large language models (LLMs). It enhances model scalability by incorporating sparse Top-K gated expert-mixing (MoE) blocks within both the visual encoder and MLP connector, while adding virtually no activation parameters during inference. CuMo pre-trains MLP blocks and initializes experts within the MoE blocks, utilizing auxiliary loss during the visual instruction fine-tuning stage to ensure balanced expert loading. CuMo outperforms other similar models on various VQA and visual instruction following benchmarks, trained entirely on open-source datasets.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

CuMo

CuMo Visit Over Time

CuMo Visit Trend

CuMo Visit Geography

CuMo Traffic Sources

CuMo Alternatives

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

Large World Models — Large World Models: Understanding Video and Language

Models Table — A comprehensive list and information about large language models

EAGLE — Exploration of the design space for multimodal large language models

LongLLaVA — Efficiently extending multimodal large language models to 1,000 images.

MM1.5 — Optimization and analysis of multimodal large language models

InternVL2_5-38B — Advanced Multimodal Large Language Model Series

Language Learning Games — AI text adventure games for language learning

Pixtral-Large-Instruct-2411 — A 124B-parameter multimodal large language model.

lmms-finetune — A unified codebase for fine-tuning large multimodal models.

NVLM-D-72B — State-of-the-art multimodal large language model

Multimodal-Maestro — More effectively prompt large multimodal models to unlock their potential.

NVLM 1.0 — A cutting-edge multimodal large language model that achieves state-of-the-art performance on visual-language tasks.

Zhipu AI Large Model Open Platform — Integrate large models with just a few lines of code.

InternVL2_5-2B-MPO — Advanced multimodal large language model

FP6-LLM — Efficiently serving large language models

mPLUG-DocOwl — A modular multimodal large language model for document understanding

Apollo-LMMs — Exploration of Video Understanding in Large Multimodal Models

BiTA — Bidirectional Adjustment for Large Language Models

ultravox-v0_4_1-llama-3_1-8b — Multimodal speech large language model

InternVL2_5-78B — Advanced multimodal large language model series

MNN Large Model Android App — A fully functional Android app supporting multimodal capabilities with a large language model.

InternVL2_5-78B-MPO — This is an advanced series of multimodal large language models that demonstrate outstanding overall performance.

InternVL2_5-8B-MPO — A large multimodal language model showcasing exceptional overall performance.

Doubao Large Model — A large model developed by ByteDance, providing multimodal capabilities.

ultravox-v0_4_1-llama-3_1-70b — Multimodal speech large language model

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

InternVL 2.5 — Open-source multimodal large language model series

NVLM 1.0 — Cutting-edge multimodal large language model

CuMo

CuMo Visit Over Time

CuMo Visit Trend

CuMo Visit Geography

CuMo Traffic Sources

CuMo Alternatives

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

Large World Models — Large World Models: Understanding Video and Language

Models Table — A comprehensive list and information about large language models

EAGLE — Exploration of the design space for multimodal large language models

LongLLaVA — Efficiently extending multimodal large language models to 1,000 images.

MM1.5 — Optimization and analysis of multimodal large language models

InternVL2_5-38B — Advanced Multimodal Large Language Model Series

Language Learning Games — AI text adventure games for language learning

Pixtral-Large-Instruct-2411 — A 124B-parameter multimodal large language model.

lmms-finetune — A unified codebase for fine-tuning large multimodal models.

NVLM-D-72B — State-of-the-art multimodal large language model

Multimodal-Maestro — More effectively prompt large multimodal models to unlock their potential.

NVLM 1.0 — A cutting-edge multimodal large language model that achieves state-of-the-art performance on visual-language tasks.

Zhipu AI Large Model Open Platform — Integrate large models with just a few lines of code.

InternVL2_5-2B-MPO — Advanced multimodal large language model

GEO Services