AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI Products

View More
Video-LLaVA

Video-LLaVA

Learns joint visual representations through prefix projection alignment.

AI video search
10.7k

Models

View More

Video Llava

AnasMohamed

V

A large-scale vision-language model based on Vision Transformer architecture, supporting cross-modal understanding between images and text

MultimodalGgufGguf
AnasMohamed
194
0

Video LLaVA 7B Hf

LanguageBind

V

Video-LLaVA is an open-source multimodal model trained by fine-tuning a large language model on multimodal instruction-following data, capable of generating interleaved images and videos.

MultimodalTransformersTransformers
LanguageBind
13.2k
42

Video LLaVA 7B

LanguageBind

V

Video-LLaVA is a multimodal model that unifies visual representations through pre-projection alignment learning, capable of handling visual reasoning tasks for both images and videos.

MultimodalTransformersTransformers
LanguageBind
2.1k
85
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2026AIBase
Business CooperationSite Map