Video-LLaMA
Public[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
blip2cross-modal-pretraininglarge-language-modelsllamaminigpt4multi-modal-chatgptvideo-language-pretrainingvision-language-pretraining
Creat:2023-05-06T23:35:19
Update:2025-03-26T16:23:44
3.0K
Stars
5
Stars Increase