VGDiffZero
Public[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
computer-visionreferring-expression-comprehensionstable-diffusiontext-to-image-generationvision-language-modelvisual-groundingzero-shot-learning
Creat:2023-09-03T20:41:48
Update:2025-02-28T09:31:55
https://arxiv.org/abs/2309.01141
16
Stars
0
Stars Increase