AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

OpenAI Releases MLE-bench: A Benchmark for Evaluating AI Agents

In a recent study, the OpenAI research team launched a new benchmark called MLE-bench, aimed at assessing the performance of AI agents in machine learning engineering. This research particularly focuses on 75 machine learning engineering-related competitions from Kaggle, intending to test agents' abilities in various skills required in the real world, including model training, dataset preparation, and experiment execution. To facilitate better evaluation, the research team utilized foundational data from Kaggle's public leaderboard to establish performance metrics for each competition.

15.5k 5 days ago
OpenAI Releases MLE-bench: A Benchmark for Evaluating AI Agents

AI Products

View More
MLE-bench

MLE-bench

Benchmark for assessing the capabilities of AI agents in machine learning engineering.

AI model evaluation
10.1k

Models

View More

GPT-5

Openai

GPT-5

$8.75

Input tokens/M

$70

Output tokens/M

400

Context Length

GLM-4.5

Chatglm

GLM-4.5

$2

Input tokens/M

$8

Output tokens/M

128

Context Length

Grok-4 Heavy

Xai

Grok-4 Heavy

-

Input tokens/M

-

Output tokens/M

-

Context Length

Claude Sonnet 4

Anthropic

Claude Sonnet 4

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map