reasoning-benchmarks

Public

A reproducible harness for evaluating LLM reasoning strategies (CoT, Self-Consistency, ToT, etc.) across benchmarks like GSM8K, ARC-Challenge, and MMLU. Supports OpenAI, Hugging Face, and Ollama backends with unified metrics and plots.

benchmark chain-of-thought llm llms reasoning

Creat：2025-09-04T02:11:10

Update：2025-09-04T02:11:55

Stars

Stars Increase

Related projects

Langchain

Hot

Elixir implementation of a LangChain style framework that lets Elixir projects integrate with and leverage LLMs.

121438

1年前

+224today

Dify

Hot

agent

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

120901

1年前

+280today

LLMs From Scratch

Hot

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

80697

1年前

+221today

Gpt4all

ai-chat

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

76955

1年前

+7today

Browser Use

Hot

ai-agents

Make websites accessible for AI agents

73433

1年前

+154today

Vllm

Hot

amd

A high-throughput and memory-efficient inference and serving engine for LLMs

64909

2年前

+236today

LLaMA Factory

Hot

agent

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

63700

1年前

+147today

MetaGPT

Hot

agent

? The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

60238

1年前

+395today

Autogen

Hot

agentic

A programming framework for agentic AI ? PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

52346

2年前

+103today

Anything Llm

Hot

agent-framework-javascript

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, and more.

51978

1年前

+113today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

reasoning-benchmarks

Related projects

Langchain

Dify

LLMs From Scratch

Gpt4all

Browser Use

Vllm

LLaMA Factory

MetaGPT

Autogen

Anything Llm