InternVideo
Public[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
action-recognitionbenchmarkcontrastive-learningfoundation-modelsinstruction-tuningmasked-autoencodermultimodalopen-set-recognitionself-supervisedspatio-temporal-action-localization
Creat:2022-11-23T20:57:00
Update:2025-03-25T17:09:41
2.0K
Stars
1
Stars Increase