CompBench
PublicCompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes.
benchmarkevaluation-llmsfoundation-modelshuman-annotationlarge-language-modelsllmsllms-benchmarkingmultimodal-deep-learningmultimodal-large-language-modelsreasoning
Creat:2024-07-24T01:48:04
Update:2025-03-26T03:04:53
https://compbench.github.io/
37
Stars
0
Stars Increase