AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

OpenAI's Latest Benchmark Test: AI Programming Ability Matches One-Quarter of Humans, Revealing Limitations

Recently, OpenAI released a significant report on AI programming capabilities, highlighting the current state of AI in software development through a $1 million real-world development project. The benchmark test, named SWE-Lancer, covered 1,400 real projects from Upwork, comprehensively assessing AI performance in both direct development and project management areas. The results indicated that the best-performing AI model, Claude 3.5 Sonnet, achieved a success rate of 26.2% in coding tasks and reported performance in project management.

12.4k 12-02
OpenAI's Latest Benchmark Test: AI Programming Ability Matches One-Quarter of Humans, Revealing Limitations

OpenAI Launches SWE-Lancer Benchmark: Evaluating Model Performance on Real-World Freelance Software Engineering Tasks

No description available

13.7k 12-18
OpenAI Launches SWE-Lancer Benchmark: Evaluating Model Performance on Real-World Freelance Software Engineering Tasks

AI Products

View More
SWE-Lancer

SWE-Lancer

SWE-Lancer is a benchmark test consisting of over 1400 freelance software engineering tasks, with a total value of $1 million USD.

Research tools
9k
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map