Protocol Cleansing Our Agent Architecture

A few weeks ago someone from the data team asked if we could update the database schema that was being populated by one of our complex agent system tools. The update is simple: two new columns are added to the table.
The definition of the instrument resided in the agent's orchestra. Its second identical version resided in the authentication agent. A third slightly different and outdated version was in a help module that someone wrote about three sprints ago. Human-in-the-loop authorization logic is wired directly to the edges of the graph, one custom implementation per tool. Changing the schema meant touching four files, retesting each agent separately, and hoping nothing silently broke.
We fixed it but it raised one important question: why do we build this way?
The honest answer is that we had no choice. Tool calling with LangGraph is a local problem by design. You define the tools where you need them, call them where you call them and you own all the pipes. This is manageable if you only have two agents but this becomes a problem when seven agents are sharing tools and overlapping the human gateway.
After doing some research we decided that instead of defining tools locally for every agent we should use a shared resource that can host all our tools and any agent that can use them.
In this article
- What is MCP?
- Creating an MCP server
- Studio vs HTTP
- It links to LangGraph
- Human-in-the-loop at the protocol boundary
- What can break productivity and why?
- The impact of MCP on our Agentic system
- The conclusion
What is MCP?
The Model Context Protocol is an open standard published by Anthropic in late 2024. It standardizes how an AI agent finds and drives tools. Instead of defining tools within the orchestrator you run them on a separate server. The agent connects to that server at runtime, asks what tools are available, and gets the list back.
A senior engineer reading this article will immediately ask: couldn't I just create a central tool registry and inject it into each agent at startup? I wondered this and used registry tools instead of MCP on another system.
Yes, you can, and if you already have something like this working, MCP is not an emergency. That's what a bespoke registry doesn't offer collaboration boundary. MCP is a protocol, not a library. Any MCP compatible client can connect to your server, LangGraph today, a different framework next year. A TypeScript client can call your Python server without any additional compilation work. Device registration does not provide this functionality.
There is also the point of group identity. In our case the ML team manages the tools, the application team manages the graph. MCP gave them a clean contract without a shared codebase.
Creating an MCP Server
The MCP server can reveal three things: Tools (required actions), Resources (reading data only), and Information (reusable templates). In an agent system that needs to take certain actions, tools are the main concern.
The Python SDK comes with a FastMCPwhich manages schema generation from type schemas and manages the protocol lifecycle. You have to write the function and decorate it with the tool decorator and the server takes care of the rest.
One thing that catches people off guard with stdio transport: never write to stdout. The MCP protocol uses stdout as its communication channel. Any misdirection print() the call will corrupt the message stream in ways that are very confusing to debug.
import sys
import logging
from mcp.server.fastmcp import FastMCP
logging.basicConfig(level=logging.INFO, stream=sys.stderr)
logger = logging.getLogger("analyst-tools")
mcp = FastMCP("analyst-tools")
@mcp.tool()
async def run_analysis(code: str, dataset: str) -> dict:
"""
Executes a Python snippet against live data and returns the result.
Use when the user wants to compute aggregates, filter records,
or derive insights. The code must assign its final output to a
variable named 'output'.
Args:
code: Python code to execute.
dataset: One of 'sales', 'inventory', 'pipeline'.
"""
logger.info(f"run_analysis | dataset={dataset}")
return await execute_in_sandbox(code, dataset)
@mcp.tool()
async def write_to_db(table: str, payload: dict) -> dict:
"""
Persists a result record to the analyst results table.
Only call this after run_analysis has returned a verified output.
Args:
table: Target table name.
payload: Key-value pairs to write as a new record.
"""
logger.info(f"write_to_db | table={table}")
return await persist_result(table, payload)
if __name__ == "__main__":
mcp.run(transport="stdio")
Docstrings are used by LLM to help the agent decide which tool to call. Therefore, writing a good docstring is very important.
Studio vs HTTP
This decision appears in all productions and many articles skip it.
Studio it runs the server as a client subprocess. Communication takes place through standard input and output. The latency is in the single-digit milliseconds, there is no network involved, and the setup is minimal. The right choice for local development, single machine use, or wherever the server and client reside in the same process tree..
HTTP streaming runs the server as an independent service. Use this if the server needs to be shared across clients or multiple machines, if you want to use it as a container, or if you need to scale horizontally. Serverless deployments like Cloud Run work well here. Stdio doesn't fit the serverless model at all because it takes over a long-lived parent process.
Switching between these in FastMCP is just one line:
mcp.run(transport="streamable-http", host="0.0.0.0", port=8080)
We just have to change transport and go inside mcp.run() and everything else remains the same.
For residential data needs, an on-premises MCP server with tools that never touch an external API gives you a clean story for your compliance team. The protocol does not care where the server is running.
It links to LangGraph
I langchain-mcp-adapters the library manages the underlying lifecycle, performs the discovery tool handshake, and translates MCP tool schemes into LangChain-compatible tool objects.
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.graph import StateGraph, MessagesState, START
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(
model="gemini-2.5-flash",
temperature=0,
max_tokens=None
)
async def run(query: str):
async with MultiServerMCPClient({
"analyst-tools": {
"command": "python",
"args": ["./mcp_server.py"],
"transport": "stdio",
}
}) as client:
tools = await client.get_tools()
llm_with_tools = llm.bind_tools(tools)
def agent_node(state: MessagesState):
return {"messages": [llm_with_tools.invoke(state["messages"])]}
graph = StateGraph(MessagesState)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools))
graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", tools_condition)
graph.add_edge("tools", "agent")
app = graph.compile()
result = await app.ainvoke({
"messages": [{"role": "user", "content": query}]
})
print(result["messages"][-1].content)
tools_condition is a built-in LangGraph module that checks whether the last message contains tool calls or not. If so, use the path to the tools property and if not, we're done. Using it instead of writing your own routing function is important because it handles edge cases and implementation misses.
One behavior to be aware of: MultiServerMCPClient it creates a new MCP session with each tool call automatically. With one request making five consecutive tool calls, that's five handshakes. It's fine for stdio on the same machine, but it's visible in HTTP transport with a remote server. For production workloads with chained tool calls, use async with client.session("analyst-tools") pinning multiple calls in one session.
Human-in-the-Loop at the Protocol Boundary
Before MCP, our authentication gateway lived in a graph. We used it interrupt_before for certain nodes, custom validation logic is wired to the edges of the graph, and it updates the UI every time a new sensitive tool is added. It worked but it also meant that adding a tool that needed approval was a tri-team effort.
After the MCP, the gateway goes to layer one between the LangGraph server and the MCP client. Any tool that conforms to the sensitivity policy hits the gate before accessing the server. The graph has no information about it.
SENSITIVE_TOOLS = frozenset({"write_to_db", "send_notification", "trigger_webhook"})
async def gated_call(tool_name: str, arguments: dict, execute) -> dict:
if tool_name in SENSITIVE_TOOLS:
# In production: push to Slack / internal UI / audit queue
print(f"nAPPROVAL REQUIRED {tool_name}")
print(f"Arguments: {arguments}")
decision = input("Approve? (y/n): ").strip().lower()
if decision != "y":
return {
"status": "rejected",
"reason": f"Operator declined '{tool_name}'."
}
return await execute(tool_name, arguments)
SENSITIVE_TOOLS is a single set, which is contacted on each tool call regardless of the agent that triggered it. A new critical tool has been added to the server? Add a word to this set. The graph does not change. The authorization UI does not change. In our internal system we loaded this from a configuration file at startup. The product and compliance team can review it without code deployment.
What Can Break Through in Productivity and Why?
The server crashes during operation. The client will receive an error on the next tool call. LangGraph's ToolNode reports this back to LLM as a tool error message. Whether the model recovers or crashes depends on your system's capabilities. At the very least, log the subprocess stderr separately to see what killed the server, unless debugging is a guess.
LLM calls the wrong tool. MCP does not protect you from this. If the definitions of your tools are unclear or overlap in definition, the model will make an incorrect routing decision. We spent a lot of time fixing the docstrings on our server mainly because it was creating a definition with the wrong words write_to_db to be called ahead run_analysis he was done. Treat tool specifications like an immediate engineering problem.
An approval gateway to a long-standing workflow. If someone needs to authorize a tool call and it takes five minutes, the agent graph is stopped waiting. LangGraph supports a continuous graph state by indexing, so you can let the process exit and restart when a decision is reached. That's more involved than what's shown here but it's a proper workflow structure that can't block threads forever.
The impact of MCP on our Agentic system
We have moved seven tools to the server, three of which have a permission gateway. The orchestrator that calls them has no idea which of them is doing what.
We have completely eliminated tool duplication. Now, run_analysis it is defined exactly in one place using seven simultaneous workflows. To update the output schema we have to make changes on the server and every client will receive the change.
Adding new skills was quick. For example add ua generate_visualisation tool the following week and the agent was using it the next day. No orchestral changes were made.
We ended up with one group owning the tools, another group managing the graph, and a clear contract between them. When the analyst team wants new capabilities, they talk to the ML team about the server, not the application team and not the graph team.
I want to share one thing that MCP does not fix: It will not make unreliable tools reliable. It won't help the LLM make better route decisions if your explanations are bad. And it doesn't replace visibility, you still need to log tool calls and track execution methods. The structure simplifies these tools, but the work is still yours.
The conclusion
By switching to MCP and moving tools from our local agent orchestrator to a dedicated server, we cleaned up our codebase, isolated our engineering issues and made the entire agent system easier to use.
Thanks to this change our ML team can now use and modify the tools independently without touching the application graph.
If you enjoyed this in-depth MCP, I would encourage you to check out my ongoing series: RAG's Enterprise Knowledge Base in Hybrid Search and repositioning in RAG production.


