AutoVisType
PublicProbing vision-language model alignment with human expert visual grouping over stratified sample of VIS30K dataset.
Probing vision-language model alignment with human expert visual grouping over stratified sample of VIS30K dataset.