Scheduled Results with LLMs: JSON Mode, Function Calling, and When to Use Each

0 4 9 minutes read

Scheduled Results with LLMs: JSON Mode, Function Calling, and When to Use Each

we talked a lot about popular techniques to improve the performance and cost of AI applications, such as feedback streaming or buffering. Today, I want to talk about something different but equally important to building real AI applications. That, structured, machine-readable results.

So far in most of the examples I have tried, we have been dealing with free text responses from the AI model. The user asks a question, the model answers in natural language, and we just display that answer to the user somehow. Simple and straightforward. But what if we need the model to return data in a specific format (eg, a JSON object) so that we can continue to process it programmatically later? What if we need a model to extract certain fields from a text or image, populate a database entry, or trigger the next action based on its response? In those cases, restoring the wall of text will not be very easy. 🤔

Thankfully, there are many solutions to this issue. There are two main ways to get systematic, machine-readable results from the LLM: JSON mode again Activity Calling (also called tool use). These two are often confused with each other (which is to be expected since they both deal with structured output, duh), but they serve very different purposes. On top of this, OpenAI has introduced a robust variant of Function Invocation called Scheduled Resultswhich takes the use of schema one step further, as we shall see. In this post, we'll take a closer look at all three, understand how each one works under the hood, and find out when to use each one.

So, let's take a look!

1. What is JSON Mode?

JSON Mode is an easy way to achieve machine-readable output from LLM. It's actually a parameter you can set in an API request to command the model always return a valid JSON object. And that's all there is to it! However, this simplicity comes at a cost, since there are no guarantees in the structure or schema of the JSON (remember that we did not define any schema, field names, or types, or anything like this), just that it will work, a separated JSON.

For example, using OpenAI's API in Python, we can enable JSON Mode by adding a parameter. response_format={"type": "json_object"} in our call to model. Specifically, it will look like this:

from openai import OpenAI

client = OpenAI(api_key="your_api_key")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Always respond in JSON format."
        },
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

print(response.choices[0].message.content)

And the answer will look like this:

{
  "name": "Maria",
  "age": 32,
  "city": "Athens"
}

And voilà! ✨ With one simple parameter change, we always get valid JSON. No need to parse strings or weird regex hacks.

There is a catch, however. JSON mode ensures that the output valid JSONbut it does not confirm a specific structure. If we run the same example multiple times, we may get slightly different field names or a slightly different structure each time. For example, one run may return "name" and one another "full_name". That's a problem when we're trying to extract certain fields reliably.

Another thing is that without stopping response_format={"type": "json_object"}it's good practice to explicitly re-teach the model to respond in JSON to the system notification. In the example above, notice how we added again “Always respond in JSON format” at the system command. Despite this, the model may return valid JSON sometimes, but not always, as its behavior may be unpredictable.

2. What is Function Calling?

Job Hitting (or tooling) is the most advanced way to get structured, machine-readable results from LLM. Instead of just asking the model to format its response as JSON, we define something schema. That is, we explicitly define the formal definition of the structure that we want the output to follow, and this way, the model is more bound to return data that exactly matches that schema. In other words, with Function Calling we define in advance what fields we expect, what types those fields should be, required, missing, and so on.

Here's what a similar output example would look like using Function Calling:

from openai import OpenAI
import json

client = OpenAI(api_key="your_api_key")

# define the schema of the output we expect
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "description": "Extract personal information from a text",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "The full name of the person"
                    },
                    "age": {
                        "type": "integer",
                        "description": "The age of the person"
                    },
                    "city": {
                        "type": "string",
                        "description": "The city the person lives in"
                    }
                },
                "required": ["name", "age", "city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_person_info"}},
    messages=[
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

# parse the structured output
tool_call = response.choices[0].message.tool_calls[0]
result = json.loads(tool_call.function.arguments)
print(result)

And the result will look like this:

{
  "name": "Maria",
  "age": 32,
  "city": "Athens"
}

The output of this example with Function Calling is the same as what we got using JSON Mode. However, the main difference is that, unlike JSON Mode, with Function Calls, the output will be consistent; it will go always follow a well-defined schema, with fixed field names, types, and any other attributes we define in it.

🍨 DataCream is a newsletter that provides news and studies on AI, data, and technology. If you are interested in these topics, register here!

Bonus: A little more on Calling Duty

Before moving on to Scheduled Outputs, it's worth pausing and explaining more about the initial trigger and use after the Function Call, which is much more than just getting structured results. Basically, the concept of Function Calling is the basis of AI agent workflow. Specifically, in an agent setting, LLM i not just to answer to the user's question, but rather it is to decide what next action to take based on user input.

For example, let's imagine a customer support assistant who can check an order, issue a refund, or transfer to a human agent, depending on what the user asks. With Function Calling, we can define all three of these candidate actions as “tools” (functions), and the output of the model will define which one should be called and which arguments are based on its input.

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order",
            "description": "Look up the status of a customer order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "The order ID"}
                },
                "required": ["order_id"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "issue_refund",
            "description": "Issue a refund for a customer order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string"},
                    "reason": {"type": "string"}
                },
                "required": ["order_id", "reason"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    messages=[
        {"role": "user", "content": "I want a refund for order #12345, it arrived broken."}
    ]
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # "issue_refund"
print(tool_call.function.arguments)  # '{"order_id": "12345", "reason": "arrived broken"}'

So, the API response object looks like this:

ChatCompletionMessage(
    content=None,
    role='assistant',
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='call_abc123',
            type='function',
            function=Function(
                name='issue_refund',
                arguments='{"order_id": "12345", "reason": "arrived broken"}'
            )
        )
    ]
)

And the print statements will output by guess:

issue_refund
{"order_id": "12345", "reason": "arrived broken"}

So, what's going on here? The model returns a tool_calls object instead of a normal text response (see howcontent is something None). Within the tool_calls thing, we see that the model decided to call issue_refund (not lookup_order), and filled in the arguments itself based on what the user said. We then parse those arguments and apply the actual return logic to our system.

Notice how the model did not return the requested data, but instead it decided which candidate action is the most appropriate to perform, and then fill in the appropriate arguments in its response. This way, we can then take those arguments and actually perform the corresponding action in our system. This is the real power of Call of Duty, and that's why it's a fundamental part of AI applications.

But let's get back to machine-readable results now, and we'll talk more about agent AI workflows and Workflows in another post.

3. What About Planned Results?

A strong variant of Function Calling is Scheduled Exit. Although Function Calling directs the model to provide output following a defined schema, it does not indeed hard. In practice, this means that some deviations from this defined schema are still possible. Such deviations can be:

A field marked as required is, in fact, omitted if the model struggles to find its value
Additional fields not defined in our schema are added
A field defined as integer it comes back like a string "32" instead of 32

…and so on.

This happens because, in the Function Call, the model is I'm trying following a schema, but this is still a generation that works hard. As with any LLM output, the output here is still individually predictable tokens, with schema being a strong token. There is still a good chance that that token-to-token generation will be diverted somewhere along the route and produce results that deviate from the defined schema.

Scheduled Output, on the other hand, takes the Hit Function one step further by ensuring that every field in the defined schema will always appear in the output as defined, with no surprises, no missing or extra fields. The main difference is that OpenAI uses restricted recording behind the scenes. This means that at each token step, the model is only allowed to generate tokens whose output matches the schema. In other words, the schema is implemented at the generation level, instead of simply being invoked by system information.

OpenAI's Structured Outputs can be activated with a simple setup strict: true in the job description:

tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "strict": True,  # enables Structured Outputs
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"}
                },
                "required": ["name", "age", "city"],
                "additionalProperties": False
            }
        }
    }
]

But again, this comes at a cost. Structured Outputs are available in GPT-4o and later models, with older models reverting to JSON mode. Not all JSON formats are supported, and it may be slow as OpenAI processes the results.

However, it is the strongest and safest way to enforce a specific schema of model results without a deviation point. In production systems where reliability and consistency are very important, this is usually the safest option.

But aren't these all the same?

JSON Mode, Function Calling, and Scheduled Results may seem to do the same thing, since they all return JSON to the model. However, as we have seen, they are very different from what they guarantee and what they are designed for. Regarding:

Schema maintenance: JSON mode returns valid JSON, but with no structure guarantees. The Call function returns valid JSON that matches the defined schema, following specific field names, types, and required fields, but deviations are still possible. Planned Results go further, enforcing that schema at the generational level, making deviation impossible.
Use the case: JSON mode is for situations where we need a machine-readable response but can live with a dynamic format. Function Calling was designed specifically for situations where a model needs to initiate an action or pass arguments to an external tool, so it's actually a common form of machine-readable output. Scheduled Output is a Job Call with guaranteed reliability, making it ideal for production pipelines where we need output consistency.
Easy setup: JSON mode is the easiest option to set up; just one parameter changes without a schema definition. On the probing side, for Function Calls and Scheduled Results, we also need to think and set up a JSON schema.

Having said that, OpenAI itself recommends always using Structured Outputs instead of JSON Mode whenever possible, as a general rule of thumb.

In my mind

Obtaining machine-readable results from LLMs and choosing the right method to do so can make a big difference in the reliability and maintainability of any AI system. Free text responses are great for conversational communication, but when our LLM is part of a larger system (such as downstream data feeds, triggers, databases, etc.), structured responses are important. JSON mode, Function Calling, and Scheduled Results can provide output, each with a different level of robustness. As with most decisions in AI engineering, the right choice depends on what you're building and how much variability you can tolerate.

If you've made it this far, you may find pialgorithms useful — a platform we've been building that helps teams securely manage organizational information in one place.

Did you like this post? Join me 💌A small stake and 💼LinkedIn

All photos by the author, unless otherwise noted.

Source link

nimda 3 weeks ago

0 4 9 minutes read