mPLUG
PublicmPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
image-captioningimage-textimage-text-retrievalmultimodalpretrainingpytorchtransformervisual-languagevqa
Creat:2023-05-08T15:32:30
Update:2025-02-20T17:04:42
https://arxiv.org/abs/2205.12005
96
Stars
0
Stars Increase