HumanEval - A benchmark for evaluating coding abilities. MBPP - A benchmark for assessing Python programming skills. MMLU - A benchmark for evaluating multilingual understanding capabilities.