generalization
PublicThematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.
Creat:2025-01-14T19:05:40
Update:2025-03-27T11:08:44
60
Stars
0
Stars Increase