Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
For years, code editing tools like Cursor, Winsurf and Github’s Copilot have been the standard for the development of software fueled by AI. But as the agent AI becomes more powerful and the mood coding takes off, a subtle change has changed the way AI systems interact with software.
Instead of working on the code, they interact more and more directly with the shell of the system in which they are installed. This is a significant change in the way the development of software fueled by AI occurs – and despite the low profile, this could have significant implications on the place where the field goes from here.
The terminal is best known as the black and white screen which you remember the pirate films of the 90s – a very old way of managing programs and manipulating data. It is not as visually impressive as contemporary code editors, but it is an extremely powerful interface if you know how to use it. And although code -based agents can write and debug code, terminal tools are often necessary to obtain code software written to something that can really be used.
The clearest sign of the passage to the terminal comes from large laboratories. Since February, Anthropic, Deepmind and Openai have all published command line coding tools (Claude Code, Gemini CLI and CLA Codex, respectively), and they are already one of the most popular companies in companies.
This change has been easy to miss, as they operate largely in the same brand as the previous coding tools. But under the hood, there have been real changes in the way the agents interact with other computers, online and offline. Some believe that these changes are just beginning.
“Our big bet is that there is a future in which 95% of the LLM-computer interaction is done by a terminal type interface,” explains Mike Merrill, co-creator of the main reference focused on the terminal terminal Terminal bench.
Tools based on terminals also arise, as well as tools based on code are starting to look fragile. The AI Windsurf code editor was torn apart by duel acquisitions, with senior executives Hired by Google And the remaining company acquired by cognition – Leaving the long -term uncertainty of the consumption product.
Techcrunch event
San Francisco
|
October 27-29, 2025
At the same time, new research suggests that programmers can overestimate the productivity gains in conventional tools. A METR study Cursor Pro tests, the main competitor of Windsurf, noted that if the developers estimated that they could do tasks from 20% to 30% faster, the observed process was almost 20% slower. In short, the code assistant actually cost programmers time.
This has left an opening to companies like Warp, which currently holds first place on the terminal bench. WARP presents itself as an “agency development environment”, an common ground between IDE programs and online command tools like Claude Code.
But the founder of Warp, Zach Lloyd, is still optimistic about the terminal, seeing it as a way to tackle the problems that would be out of reach for a code editor like Cursor.
“The terminal occupies a very low level in the developer battery, it is therefore the most versatile place to be racing agents,” explains Lloyd.
To understand how the new approach is different, it can be useful to look at the benchmarks used to measure them. The generation of code -based tools focused on resolving GitHub problems, the base of the SWE bench test. Each problem on Swe -Bench is an open problem of Github – essentially, a piece of code that does not work.
Iteent models on the code until they find something that works, solving the problem. Integrated products like Cursor have built more sophisticated approaches to the problem, but the Github / Swe-Bench model is always at the heart of how these tools approach the problem: starting with the broken code and transforming it into a code that works.
Tools based on terminals take a wider view, looking beyond the whole environment code in which a program runs.
In A terminal problemThe instructions give a decompression program and a target text file, putting the agent to backfire a corresponding compression algorithm. Another Ask the agent to build the Linux nucleus from the source, not having mentioned that the agent will have to download the source code himself. The problem solving requires the type of bull problem solving capacity that programmers need.
“What makes Terminalbench difficult is not only the questions we give to the agents,” said the co-creator of the terminal bench Alex Shaw. “These are the environments in which we place them.”
Above all, this new approach means tackling a step -by -step problem – the same competence that makes the agent IA so powerful. But even advanced agent models cannot manage all of these environments. Warp obtained his high score on the terminal bench by solving just over half of the problems – a mark of the difficulty of the reference and the quantity of work to be done to unlock the full potential of the terminal.
However, Lloyd thinks that we are already at a point where terminal -based tools can reliably manage a large part of the non -coding work of a developer – a value proposal that is difficult to ignore.
“If you are thinking of the daily work of setting up a new project, determining dependencies and making it work, Warp can almost do it independently,” explains Lloyd. “And if he can’t do it, it will tell you why.”
(Tagstotranslate) AI coding tools
Source link