Home
Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

reasoning-benchmarks

Public

A reproducible harness for evaluating LLM reasoning strategies (CoT, Self-Consistency, ToT, etc.) across benchmarks like GSM8K, ARC-Challenge, and MMLU. Supports OpenAI, Hugging Face, and Ollama backends with unified metrics and plots.

Creat2025-09-04T02:11:10
Update2025-09-04T02:11:55
0
Stars
0
Stars Increase