Recently, with the rapid development of artificial intelligence (AI) technology, the way programming tools are used has undergone significant changes. Not long ago, code editing tools like Cursor, Windsurf, and GitHub's Copilot were the mainstream of AI-driven software development. However, with the rise of "autonomous agent AI" and the popularity of "ambient programming," the way AI systems interact with software has quietly changed. Now, AI tools are increasingly interacting directly with the command line interface (terminal) of the system.
The terminal, once widely portrayed in 90s hacker movies as a black-and-white screen, may not look as cool as modern code editors, but its powerful operational capabilities are not to be underestimated in program development. AI can not only write and debug code, but terminal tools are key to turning code into usable software.
This shift is most evident in the release of command-line coding tools by major labs. Since February this year, Anthropic, DeepMind, and OpenAI have successively launched command-line tools such as Claude Code, Gemini CLI, and CLI Codex, which have quickly become some of the most popular products among companies.
Although this change may not be immediately noticeable, it actually marks a fundamental shift in how AI interacts with computers. Many experts believe that this trend is just beginning. Mike Merrill, co-creator of Terminal-Bench, said, "We firmly believe that 95% of large language models (LLMs) will interact with computers through interfaces similar to terminals in the future."
At the same time, traditional code editing tools are also facing considerable challenges. The AI code editor Windsurf has gone through a series of acquisitions, making the company's future uncertain. New research shows that programmers overestimate the productivity improvements offered by traditional tools. For example, a study by METR found that although developers believed using Cursor Pro could increase their efficiency by 20% to 30%, actual observations showed that task completion speed actually slowed down by nearly 20%.
In this context, companies like Warp have risen rapidly, becoming top contenders in terminal tools due to their high scores on Terminal-Bench. Warp's founder Zach Lloyd is confident about the terminal, believing it is an ideal place to solve problems that code editors struggle with.
The key to the new approach lies in defining its performance benchmarks. Traditional tools usually focus on solving code issues on GitHub, while terminal tools take a broader perspective, covering aspects such as code writing and DevOps tasks. For instance, one question on Terminal-Bench requires AI to reverse-engineer a compression algorithm, while another asks it to build a Linux kernel from source code. This requires the perseverance of programmers to solve problems.
Although current terminal tools have not yet fully unlocked their potential, Lloyd believes they are already capable of handling many non-coding tasks for developers, which is undoubtedly a promising prospect.