OSWorld-MCP: A New Evaluation Benchmark to Promote the Development of Computer Agent Products
OSWorld-MCP is the first benchmark for evaluating computer agents in real environments, testing tool usage, GUI operations, and decision-making with 158 verified tools.....