Open-Vocabulary SAM
Interactive Segmentation and Recognition Model
CommonProductImageVision Foundation ModelInteractive Segmentation
Open-Vocabulary SAM is a vision-based foundation model built upon SAM and CLIP, focusing on interactive segmentation and recognition tasks. It achieves a unified framework of SAM and CLIP through two unique knowledge transfer modules, SAM2CLIP and CLIP2SAM. Extensive experiments on various datasets and detectors demonstrate the effectiveness of Open-Vocabulary SAM in segmentation and recognition tasks, significantly outperforming naive benchmarks combining SAM and CLIP. Moreover, training with image classification data enables this method to segment and recognize approximately 22,000 categories.
Open-Vocabulary SAM Visit Over Time
Monthly Visits
23904807
Bounce Rate
43.33%
Page per Visit
5.8
Visit Duration
00:04:51