Researchers have introduced a novel visual prompting method called SoM, which enables OpenAI's multimodal large model GPT-4V to perform better on fine-grained visual tasks. SoM employs an interactive segmentation model to divide images into different regions and adds markers to each region. This approach allows GPT-4V to better comprehend objects and spatial relationships within images, and it achieves performance superior to dedicated models and other open-source multimodal models across multiple visual tasks. This research demonstrates the potential of GPT-4V in fine-grained visual tasks.