Hitting the Tool, Explained: How AI Agents Decide What to Do Next

0 6 9 minutes read

Hitting the Tool, Explained: How AI Agents Decide What to Do Next

In my latest post, how to get structured, machine-readable results as response from LLM, using JSON Mode, function calling, and structured results. In that post, we briefly touched on the concept of calling, approaching it as a way to get structured responses. However, function calling is something that goes well beyond getting structured data from a model, because it is the core of AI agent workflow. Therefore, in today's post, we will take a closer look at this topic.

In all the examples we've covered so far, LLM has just been used as a responsive responder, meaning it receives a question and produces an answer, and that's it. But what if we want the LLM to not only respond with something but rather respond do something? Or to put it more precisely, what if we want an action to be triggered based on the model's response? This action can be anything: look at live data, send a message, query the database, call an external API, and so on.

This is made possible by to call the instrument. Hitting the tool is what transforms LLM from a very clever text generator into something that can trigger actions and interact with the world around it.

So, let's take a look!

What is a Hit Tool?

Tool calling (also called function calling) is how LLM can request the execution of external functions or APIs as part of generating its response. In other words, instead of just returning text, the model can perform a specific task with specific arguments, such as a response to a user request.

The main thing to understand here is that the model itself does not use the tool. Only decides which tool to call and with what arguments. The actual execution of the selected tool happens in our code, when the AI model application is installed. We then feed the result of the tool back to the AI model, which uses it to generate the final response to the user.

This is a tool call loop, which includes the following steps:

A user sends a message
An AI model takes a message as input and produces an output, which is a decision about which tool to use and which arguments.
The model response containing the selection of tools and the appropriate arguments to use is returned to the code. The code – without the involvement of the AI model – uses the chosen tool with the chosen arguments. This implementation produces some kind of result (eg, a calculation, information obtained from an API, etc.), and this result is then passed back to the AI model.
The AI model takes as input the result of the tool and generates a final response to the user based on that.

Also, the model generates the tool call, not the tool execution. These two things are very different things, and mixing them up is one of the most common sources of confusion.

But what exactly is a toolkit? Basically, it means that the model returns a structured, machine-readable response using Function Calling, as we saw in the previous post. In this answer, the content is None; there is no natural language response, just a structured command indicating which tool to call and with which arguments. It is only after we run the tool and pass the result back that the model produces the actual text response of the user.

But let's see this in action!

We'll start with a simple example using one tool and one call, and then build on to some interesting scenarios.

1. One tool: weather API

I think the most common example of using tools with AI that comes to mind is the weather API (custom cornerstone, live data), so let's imagine building a weather assistant. In particular, we want to create a method where the user asks about the weather, and instead of letting the AI model do something (which the model can do very happily 🙃), we want it to call a real weather function and get real data about the weather somewhere, outside of LLM. To get weather data, I'll use Open-Meteo, a free, open-source weather API that happily doesn't require an API key.

To use a tool, we must first declare it tools.

from openai import OpenAI
import json

client = OpenAI(api_key="your_api_key")

# Step 1: define the tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The name of the city, e.g. Athens"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

Note that the actual tool to be used (the weather API) is not mentioned anywhere so far. Instead, the model decides which tool to call based on three factors: the description of the task (“Get the current weather for a specific city”), parameter definitions (“The name of a city, eg Athens”), and the forced schema. Only from this information does the model determine if this is the right tool to call a given user message and which arguments. Therefore, writing clear and accurate descriptions when describing our tools is very important for the model to successfully identify and call the correct tool based on user input.

So, after defining the tool variable, we can apply it to the AI model:

# Step 2: send the user message along with the tool definition
messages = [
    {"role": "user", "content": "What's the weather like in Athens right now?"}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    messages=messages
)

print(response.choices[0].message)

Here's what happens when we make this request. The model reads the user's message, “What's the weather like in Athens right now?”and you understand that the tool available get_current_weather can help answer this question with real, live data. So, instead of generating a text response directly, it decides to call the tool first. Specifically, the model response at this point looks like this:

ChatCompletionMessage(
    content=None,
    role='assistant',
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='call_abc123',
            type='function',
            function=Function(
                name='get_current_weather',
                arguments='{"city": "Athens", "unit": "celsius"}'
            )
        )
    ]
)

Be aware of what the content is like Nonebecause the model does not return a text response, but a tool call. Now our task is to actually create the tool, the selected model, and return the result to it. In our case, this will make an API request to the weather API, using the arguments (i.e. city and unit of measurement) provided in the AI model response:

# Step 3: execute the tool using the Open-Meteo API
import requests

def get_current_weather(city: str, unit: str = "celsius"):
    # geocode the city name to coordinates
    geo = requests.get(
        "
        params={"name": city, "count": 1}
    ).json()
    lat = geo["results"][0]["latitude"]
    lon = geo["results"][0]["longitude"]

    # fetch current weather
    weather = requests.get(
        "
        params={
            "latitude": lat,
            "longitude": lon,
            "current": "temperature_2m,weather_code",
            "temperature_unit": unit
        }
    ).json()

    temp = weather["current"]["temperature_2m"]
    return {"city": city, "temperature": temp, "unit": unit}

# extract the tool call from the response
tool_call = response.choices[0].message.tool_calls[0]
arguments = json.loads(tool_call.function.arguments)

# call the actual function
weather_result = get_current_weather(**arguments)

we can then add the result of the tool to the message history and send everything back to the model:

# Step 4: add the assistant's tool call AND the tool result to the message history
messages.append(response.choices[0].message)  # important: append the tool call first
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,  # links the result back to the specific tool call
    "content": json.dumps(weather_result)
})

# Step 5: send everything back to the model for a final response
final_response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    messages=messages
)

print(final_response.choices[0].message.content)

And now, we finally get the correct text response:

It's currently 29°C in Athens. Sounds like a great day to be outside!

🍨 DataCream is a newsletter that provides news and studies on AI, data, and technology. If you are interested in these topics, register here!

2. Allowing the model to choose from multiple tools

Now let's look at a practical example. In a real-world application, the model usually cannot reach one, however many tools, and therefore, needs to find out which one (or one) needs to be used based on what the user is asking.

Let's extend our first weather API example by adding an additional tool for currencies. In this case, we will use Frankfurter, a currency API that provides daily rates of the European Central Bank, and without the requirement of an API key. So, let's review ours tools change by adding a second currency conversion tool:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "The name of the city"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "convert_currency",
            "description": "Convert an amount from one currency to another",
            "parameters": {
                "type": "object",
                "properties": {
                    "amount": {"type": "number", "description": "The amount to convert"},
                    "from_currency": {"type": "string", "description": "The source currency code, e.g. USD"},
                    "to_currency": {"type": "string", "description": "The target currency code, e.g. EUR"}
                },
                "required": ["amount", "from_currency", "to_currency"]
            }
        }
    }
]

And set the original convert_currency function using the Frankfurter API:

def convert_currency(amount: float, from_currency: str, to_currency: str):
    response = requests.get(
        f"
    ).json()

    rate = response["rate"]
    converted = round(amount * rate, 2)
    return {
        "amount": amount,
        "from_currency": from_currency,
        "to_currency": to_currency,
        "converted_amount": converted,
        "rate": rate
    }

In this way, the model can handle a much wider range of user requests; now it can even answer about currencies, on top of the weather 😋. Now, if the user asks “How's the weather in Athens?”the model should drive get_current_weather. If they ask “How much is 100 USD in EUR?”will call convert_currency. And if we ask for something unrelated to weather and currencies where no tools are available to help, the model will just respond with text without calling any tool at all.

But let's see this in action:

messages = [
    {"role": "user", "content": "How much is 200 USD in EUR?"}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    messages=messages
)

tool_call = response.choices[0].message.tool_calls[0]

Let's look at the answer:

print(tool_call.function.name)

where we find convert_currency. So, the model understood that the question “How much is 200 USD in EUR?” corresponds to convert_currency a tool. Let's look at the arguments again:

print(tool_call.function.arguments)

where we find

'{"amount": 200, "from_currency": "USD", "to_currency": "EUR"}'

Therefore, the model identifies well convert_currency like the right tool and fills in the right arguments, unless we do anything but provide the right tool definitions, and the user gives the right message. It is this decision-making process that makes tool calling the basis of agent systems.

3. Driving multiple tools at the same time

Another interesting tool for calling the situation is that many models, similar gpt-4oit can call multiple tools with a single response when a user request requires it. This is known as a parallel instrument call.

For example, let's consider a situation where a user requests from one application something that requires the use of both. get_current_weather again convert_currency tools to get the necessary information:

messages = [
    {"role": "user", "content": "What's the weather in Athens and how much is 100 USD in EUR?"}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    messages=messages
)

for tool_call in response.choices[0].message.tool_calls:
    print(tool_call.function.name)
    print(tool_call.function.arguments)

In this case, the answer we get is the following:

get_current_weather
{"city": "Athens"}

convert_currency
{"amount": 100, "from_currency": "USD", "to_currency": "EUR"}

Note how both tools are called from the same model response. We can then run the appropriate tools with the given arguments and return the results of the tools to the model together. This is more efficient than sequential calls, and is how most advanced agents handle multi-part requests.

In my mind: So, what does this organization do?

One thing that always bothers me is the word “agent” that gets thrown around everything. Agents, agent workflow, anything from the name agent It's all the rage these days, but as you've probably discovered for yourself, not everything is sold as an agency.

So let's go back and think about what an agent actually is in the first place. At its core, an agent is something that perceives its environment, processes that information in some way, has a goal, and then decides what action to take to achieve it. Consider what our call tool does: it sees the available tools, decides which one is appropriate to handle the user's request (if any), and passes that decision on to the rest of the code for execution. That, in its simplest form, is agency.

In real-world agent applications, the tool call loop executes not once but many times, the model uses the results of a single tool call to determine which, and which, tool to call next. This is sometimes called the ReAct loop (Reason + Act), and it's what allows agents to handle complex, multi-step tasks that can't be solved in a single call.

Ultimately, what I find most interesting about tooling is how it changes the nature of what an LLM is. Until now, the language model was essentially a a lot high-level input function, which takes text as input and produces text as output. But with the toolkit, we get access to an endless collection of additional functions, which we can combine with the LLM's thinking power to create even more powerful systems.

✨ Thanks for reading! ✨

If you've made it this far, you may find pialgorithms useful — a platform we've been building that helps teams securely manage organizational information in one place.

Did you like this post? Join me 💌A small stake and 💼LinkedIn

All photos by the author, unless otherwise noted.

Source link

nimda 3 weeks ago

0 6 9 minutes read