Shanghai Jiao Tong University and Team Unveil SWE-Explore Benchmark Revealing the Line-Level Localization Flaws of AI Coding Agents

AIbase基地

Published inAI News · 4 min read · Jun 15, 2026

An international research team, including Shanghai Jiao Tong University, officially launched a new benchmark testing tool called SWE-Explore today. This tool first quantitatively reveals significant technical shortcomings of current AI coding agents at the "line-level accuracy" by decoupling code search from the actual repair phase. This study breaks away from the previous single evaluation model that only relied on the "final repair rate," providing a new standard for directly measuring the quality of upstream search in agents, and is driving the evolution of AI software engineering evaluation toward deeper areas.

Traditional benchmarks such as SWE-bench often mask the real defects of agents in the code reading and understanding stages because they only focus on end-to-end results. To address this, the research team extracted consensus code segments from multiple independent solution paths based on the successful operation trajectories of mainstream large models like GPT-5.4, Gemini3Pro, Claude Sonnet4.6, and Kimi K2.6, building a dataset containing 848 defect tasks across 10 programming languages and 203 open-source projects.

The evaluation results show that although general coding agents like Claude Code and OpenHands perform well in "file-level" positioning, their core area coverage drops sharply to between 14% and 19% when focusing on specific "code lines." Ablation experiments further confirmed the existence of the "minimum context threshold" effect: when the visible proportion of key core areas is below 50%, the model's repair generally fails; however, once it crosses the threshold of 50% to 75%, the repair success rate shows a dramatic increase.

This research result indicates that the current bottleneck of AI agents is not entirely about patch writing capability but rather about accurately filtering and capturing critical context. In the current industry context where project managers reject half of automated adoption proposals, the "less filtering, more reading" technical direction proposed by SWE-Explore not only points the way for the architecture optimization of next-generation specialized code localization systems (such as CoSIL), but also accelerates the paradigm shift of automated software engineering from "brute-force generation" to "precise retrieval."

Xiaomi Open-Sources Terminal AI Coding Assistant MiMo Code with Free Top-Grade Multimodal Model Built-In

Xiaomi's tech team open-sourced AI coding assistant MiMo Code V0.1.0, based on OpenCode with MIT license, enabling free modification and commercial integration. It features 'model-agent synergy optimization' and a unique persistent memory system to address AI coding tools' forgetfulness, aiming for self-evolution.....

University Expands AI Experiment: 500,000 Students and Teachers Explore the Future of Education with ChatGPT

California State University has signed a $16.9 million agreement with OpenAI, becoming the largest-scale ChatGPT educational application pilot. Over 500,000 students and teachers within the university system will use "ChatGPT Edu," aiming to create the largest AI-driven public university system in the United States. This move marks the deep integration of AI technology in education, with campuses introducing AI administrators to support career development.

Google and SpaceX Collaborate to Explore Space Data Centers: A Solar-Powered Future for AI

Google is in talks with SpaceX to launch its first orbital data center in space as part of its "Sun Catcher Program." The project aims to verify machine learning technologies in the space environment, create a satellite network fully powered by solar energy, and deploy Google's self-developed TPU AI chips.

College Students Leverage AI to Explore Overseas Markets, Slippers Sales Exceed 250,000 Pairs

Post-2000s college student Rong Hairui led a team to use AI to sell Guangdong-made slippers to the US, Japan, and South Korea, generating over 3 million yuan in revenue. Initially, he helped a shoe factory facing domestic losses by inputting material, cost, and expected prices into an AI system to assess international market potential. The slippers, though unfashionable in China, successfully expanded overseas.....

GLM-5.1 Released by Zhipu: Leading Global SWE-bench Score, Model Price Increased by 10%

Zhipu AI launches GLM-5.1, raising prices by 10% across the board, with programming and other scenarios now priced similarly to Claude 3.5 Sonnet. This marks the first time a domestic Chinese model aligns pricing with top global providers in key areas, shifting industry competition from price wars to performance-based rivalry.....

7 Days of 23,000 Star Surge! GitHub Outsourcing Company's Project Goes Viral: Exposing the All-Powerful Illusion of Large Models

The open-source project agency-agents gains popularity with its 'assembly expert' model, focusing on collaborative division of labor to create 'plug-and-play' digital outsourcing teams rather than large model parameters. As of March 24, 2026, it surpassed 60,000 GitHub stars, adding 23,000 in a week to top global weekly growth charts, quickly sweeping through the developer community.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Shanghai Jiao Tong University and Team Unveil SWE-Explore Benchmark Revealing the Line-Level Localization Flaws of AI Coding Agents

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Xiaomi Open-Sources Terminal AI Coding Assistant MiMo Code with Free Top-Grade Multimodal Model Built-In

University Expands AI Experiment: 500,000 Students and Teachers Explore the Future of Education with ChatGPT

Research Tool! This Open-Source Tool Makes Writing Papers Easy and Efficient

Google and SpaceX Collaborate to Explore Space Data Centers: A Solar-Powered Future for AI

College Students Leverage AI to Explore Overseas Markets, Slippers Sales Exceed 250,000 Pairs

Claude Opus 4.7 Released: Reliability Is More Important Than Intelligence

GLM-5.1 Released by Zhipu: Leading Global SWE-bench Score, Model Price Increased by 10%

Domestic Large Model MiniMax 2.7 Confirmed to Open Source This Week: Token Cost Will Continue to Drop

Live Without Internet! The Apocalypse Survival Toolbox N.O.M.A.D. Is Released: Built-in Offline Encyclopedia and AI Large Model

7 Days of 23,000 Star Surge! GitHub Outsourcing Company's Project Goes Viral: Exposing the All-Powerful Illusion of Large Models

AI News Recommendations

Xiaomi Open-Sources Terminal AI Coding Assistant MiMo Code with Free Top-Grade Multimodal Model Built-In

University Expands AI Experiment: 500,000 Students and Teachers Explore the Future of Education with ChatGPT

Research Tool! This Open-Source Tool Makes Writing Papers Easy and Efficient

Google and SpaceX Collaborate to Explore Space Data Centers: A Solar-Powered Future for AI

College Students Leverage AI to Explore Overseas Markets, Slippers Sales Exceed 250,000 Pairs

Claude Opus 4.7 Released: Reliability Is More Important Than Intelligence

GLM-5.1 Released by Zhipu: Leading Global SWE-bench Score, Model Price Increased by 10%

Domestic Large Model MiniMax 2.7 Confirmed to Open Source This Week: Token Cost Will Continue to Drop

Live Without Internet! The Apocalypse Survival Toolbox N.O.M.A.D. Is Released: Built-in Offline Encyclopedia and AI Large Model

7 Days of 23,000 Star Surge! GitHub Outsourcing Company's Project Goes Viral: Exposing the All-Powerful Illusion of Large Models