UniVL
PublicAn official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
alignmentcaptioncaption-taskcoinjointlocalizationmsrvttmultimodal-sentiment-analysismultimodalitypretrain
Creat:2020-10-30T13:22:22
Update:2025-03-25T13:19:49
https://arxiv.org/abs/2002.06353
359
Stars
0
Stars Increase