One Flexible Tool Beats a Hundred Dedicated

0 5 7 minutes read

One Flexible Tool Beats a Hundred Dedicated

when you wanted the LLM agent to talk to the system at the beginning of 2026 it was to install the MCP server.

GitHub. Jira. Comfort. Linear. Postgres. Neo4j. Each sends a server that exposes a structured menu of tools, create_issue, list_pull_requests, merge_pull_request, get_repository, search_codeand so on, and you point your agent to it.

It's a great ride. And, with a surprising amount of real work, it's the wrong situation.

The thesis is short: MCP architectures often wrap each service as a set of dedicated tools; the CLI gives the agent a really flexible tool. With today's models, the versatile tool wins.

Comparison of MCP vs CLI methods.

These two situations ask the model to perform a different task. With a host of dedicated tools, an agent should do just that select the correct one from the menu. With a versatile tool, you should find out how to put the pieces together. That second part was often difficult. Models will show false flags, lost thread on long pipes, ill-read help text, so wrapping all functionality with a pre-baked tool was a reasonable defense. That is no longer true. Today's models read a --help page or SKILL.md if needed, know the canonical CLIs from training, compile bash without supervision, and try again when the flag is wrong. The hard part became simple, the simple part remained simple, and all those neatly wrapped tools usually just fill the core of the model for free now.

Of course it's not all roses and sunshine. Giving the agent a terminal also gives it a much larger blast radius. The same flexibility that allows you to compose gh | jq | xargs for something useful and allows a quick injection to talk into something worse than a hostile Cypher query. So yes, there are trade-offs, and you have to think about them (sandbox, whitelist, different OS user, read-only role in the database, the usual stuff).

But when you can give the agent a terminal in a reasonably safe way, the flexible side still comes out ahead.

Where the CLI shines

The same pattern of “wrapping the service as a bunch of dedicated tools” is seen everywhere MCP does. Postgres MCPs vs. psql. Kubernetes MCPs vs. kubectl. Filesystem MCPs vs. cat, ls, mv, grep attached with pipes. Same feeling every time, same CLI partner every time. And the same three methods of failure too, because they are not related to any one product.

Nothing in the MCP spec requires this method of assembling dedicated tools. The protocol asks for typed tools, nothing else; it doesn't say anything about how small each tool should be. Implementations tend to gravitate towards many smaller tools for historical reasons. You can build flexible tools that take a single input and shape the agent the way they want, and most of the time you should.

To make it happen, we'll look at an example that pits the Neo4j MCP server against the Neo4j CLI.

Disclaimer up front: I work at Neo4j. The choice is simple, but the lessons learned apply to many other CLIs.

The Neo4j MCP server is an official server that exposes Neo4j to agents through MCP, deploying a number of dedicated tools such as query reading, query writing, and schema retrieval. neo4j.sh is the official command-line interface for Neo4j, a single binary that you run in a terminal with profiles that authenticate each database you talk to. To keep the comparison honest, we will only look at the read query and the schema on the MCP side versus parity. query application to neo4j.sh. Same functions, same database, same Cypher running over the wire. The only thing that changes is whether the agent accesses them through a typed tool program or through a string provided to the shell.

Asking in all places

We've already seen how a bunch of dedicated tools eat up the context window with definitions, and that some servers now send deferred tools to offset those costs until the agent gets to them. But there's a second multiplier that doesn't talk to anyone: what happens if you want to talk to more than one instance of the same service. With MCP, the number of tools not only increases in features, but also in areas.

Connects to multiple databases via MCP or CLI.

The agent looks for node counts from dev, stage, and prod. With MCP, you stand up a neo4j-mcp-server each environment, each running its own four tool schemes in the agent context at all times. The three details are the twelve schemes in the model window, the same four schemes three times, before the agent does anything.

With the CLI, it is a for loop:

$ for c in dev staging prod-ro; do
    neo4j-cli query -c $c --format toon 
      "MATCH (n) RETURN count(n) AS nodes"
  done

One binary, three authentication profiles, zero core cost per spin. Adding a fourth place is one more credential dbms addthere is no longer a single MCP server process. The same orientation extends to any “access to N similar objects” workflow you might want: taking a snapshot before dangerous deployment, separating the schema between stage and prod, running a health check on all databases the agent knows about.

Chaining questions

Say an agent is investigating a known fraudulent account: from a single interest, find every account that has been active, and find out which one the rest the accounts of those parties are normally active. Two queries against the same database, where the second parameters are the first result.

For MCP, the model must be a pipe. It costs read-cypherthe result comes back as a list of, say, 80 other IDs, those 80 IDs sit in the model context now, the model formats them into a second parameter. read-cypher drive, and only then can you ask for two runs. The middle list rides the conversation verbatim, and every additional ID is another line of context that the agent pays to read again or not.

With the CLI, the pipeline is real |:

$ neo4j-cli query -c prod-ro --format json 
    --param "seed=acct_19f3" 
    "MATCH (:Account {id: $seed})-[:TRANSACTED]-(c:Account)
     WHERE c.id <> $seed
     RETURN collect(DISTINCT c.id) AS counterparties" 
  | neo4j-cli query -c prod-ro --params-from-stdin 
      "MATCH (a:Account)-[:TRANSACTED]-(b:Account)
       WHERE a.id IN $counterparties
         AND NOT b.id IN $counterparties + ['acct_19f3']
       RETURN b.id, count(DISTINCT a) AS edges_into_cluster
       ORDER BY edges_into_cluster DESC LIMIT 20"

--params-from-stdin reads the JSON result of the previous query and binds it as a parameter to the next one. The list of counterparties never enters the context of the model, the token cost of the agent is the same whether the pool has 5 or 500 counterparties.

This is where the shell starts to feel like a different class of tools altogether. The agent no longer selects from the action menu, merges pipelines, and intermediate data must not appear. A two-step question becomes a |. The fan-out becomes a for loop. Joining across two databases becomes one query with a pipe in another with --params-from-stdin. Each of those would be three or four MCP round trips and all the intermediate results are displayed in the context window, and by then the agent has spent more tokens shuffling lines than thinking about them.

Pipeline to multiple CLIs

Same problem, bigger scale. Say the agent wants to make the latest GitHub issues of the project Neo4j: an :Issue node per ticket, a :User node per author, a :TAGGED relationship each label. The data resides in a single CLI (gh), wants to be reformatted (jq do that), then it resides in another CLI (neo4j-cli). Three different tools in one line. With MCP, you can hit MCP's GitHub issue list server, each issue is in the context of the model, the model outputs the fields it wants, and write-cypher fires once per issue. Hundreds of round trips with the model, every topic is sitting in conversation along the way.

With the CLI, three programs in the pipeline:

$ gh issue list --repo neo4j/neo4j --limit 100 
    --json number,title,author,labels 
  | jq -c '.[]' 
  | while read issue; do
      neo4j-cli query --rw -c prod 
        --param "data=$issue" 
        "WITH apoc.convert.fromJsonMap($data) AS i
         MERGE (n:Issue {number: i.number}) SET n.title = i.title
         MERGE (u:User {login: i.author.login})
         MERGE (u)-[:OPENED]->(n)
         FOREACH (label IN i.labels |
           MERGE (l:Label {name: label.name})
           MERGE (n)-[:TAGGED]->(l))"
    done

gh draws problems, jq reshapes each into a single line of JSON, i while loop gives each row neo4j-cli as a Cypher parameter. The model writes this script once and takes off; data flows through bash, not through the agent. One hundred or ten thousand issues, the cost of the agent token is the same.

Shapes often go beyond GitHub. Change gh with any other CLI that outputs JSON (jira issue list, linear, curl against a webhook, which is internal to you dump command), change the Cypher pattern in any database you build, and the pipeline. The two MCP tools are not interoperable; two CLIs would be, as well as ten.

Terminal management is powerful, and that's what matters

The terminal isn't a fixed place, it's the most flexible tool you can give an agent because it includes everything else in the box.

That power also holds it. A fluctuating tool used incorrectly does fluctuating damage. With greater access to the terminal comes an obvious responsibility: sandbox the shell, list the actions you really want, use the agent as a different user of the OS, add attributes to the roles that can do something harmful. None of this is novel, it's just the hygiene of a sysadmin used in a fast-typing LLM. And if you can't do any of that, an MCP server with a fixed small footprint is still the right answer; protocol level assurance that the agent cannot cat ~/.ssh/id_rsa it's a real thing.

The broad point is there even if you stay entirely within the MCP. The reason terminal wins isn't because bash is special, it's that bash is the one tool with the most flexible input. Pipes, flexible, replacing, looping. That's the shape you should copy. Study the terminal as an MCP limit case and design around it: several tools, each accepting expressive input, an agent that does the composition work instead of waiting for the whole combination in advance. Most MCP servers are a long list of narrow endpoints because that's how the underlying API was already built, not because the agent works better that way. Servers that age well will be those that have chosen a small, sound environment on purpose.

All images on this blog are created by the author.

Source link

nimda 12 hours ago

0 5 7 minutes read