PrismBench

Public

PrismBench: A comprehensive framework for evaluating Large Language Model capabilities through Monte Carlo Tree Search. Systematically maps model strengths, automatically discovers challenging concept combinations, and provides detailed performance analysis with containerized deployment and OpenAI-compatible API support.

automated-testing benchmarking code-generation llm-evaluation machine-learning monte-carlo-tree-search

Creat：2025-05-16T02:34:56

Update：2025-06-07T23:44:00

https://prismbench.github.io/Demo/

Stars

Stars Increase

Related projects

Build Your Own X

Hot

awesome-list

Master programming by recreating your favorite technologies from scratch.

447517

1年前

+1027today

N8n

Hot

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

161403

5年前

+680today

Stable Diffusion Webui

Hot

Stable Diffusion web UI

158833

1年前

+73today

30 Seconds Of Code

astro

Coding articles to level up your development skills

125990

1年前

+35today

Nodebestpractices

best-practices

:white_check_mark: The Node.js best practices list (July 2024)

104650

1年前

+21today

Playwright

Hot

automation

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

80014

1年前

+98today

Nocodb

admin-dashboard

? ? ? Open Source Airtable Alternative

59024

3年前

+45today

Gpt Engineer

CLI platform to experiment with codegen. Precursor to: https://lovable.dev

55097

1年前

+10today

100 Days Of ML Code

Hot

100-days-of-code-log

100 Days of ML Coding

48979

1年前

+81today

Llm App

chatbot

Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. ?Docker-friendly.?Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

47713

1年前

+16today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator