SPARC
Enhancing fine-grained understanding of image-text pre-training
CommonProductImageImage-text pre-trainingFine-grained understanding
SPARC is a simple method for pre-training on image-text pairs, aiming to learn more fine-grained multi-modal representations from them. It utilizes sparse similarity metrics and grouping of image patches and language tokens, learning representations that encode both global and local information through contrastive fine-grained sequence loss and global image-text embedding contrastive loss. SPARC shows improvement on both coarse-grained image-level tasks and fine-grained region-level tasks, including classification, retrieval, object detection, and segmentation. Additionally, SPARC enhances model trustworthiness and image description capabilities.
SPARC Visit Over Time
Monthly Visits
25537072
Bounce Rate
44.24%
Page per Visit
5.9
Visit Duration
00:04:47