The New Standard for Programming Agents! MiniMax Releases OctoCodingBench Benchmark
MiniMax introduces OctoCodingBench, an open-source benchmark for evaluating programming agents' ability to follow instructions in code repository environments, addressing gaps in existing benchmarks like SWE-bench that focus on task completion.....