How to ensure credibility in llm apps

nimda July 15, 2025

0 7 5 minutes read

You have installed a computer science at the speed of records. Llms with strong models able to perform many different tasks. However, the llM's out of the stochastic, makes it dishonest. In this article, I discuss how you can confirm your credibility to your LLM plans by redeeming the model and managing the result.

This highlights the content of this article. I will discuss mainly by verifying the consistency of issuing and taking errors. Photo by ChatGPT.

You can also learn my travel articles from Nvidia GTC Paris 2025 and create a powerful increase in a machine reading.

Content

Motive

My motivation on this topic is that I consistently improve new programs using the llms. The LLMS is with common tools that can be used in many activities such as being separated, summarizing, release, and more. In addition, an increase in vision language models also enable us to manage images in which we treat text.

I usually meet the problem that my LLM programs don't match. Sometimes the llm doesn't answer in the desired format, or I cannot compute the llm feedback. This is a major problem when you work in production production and are completely dependent on conversion to your request. That way I will discuss the strategies that I use to ensure the integrity of my applications in a production situation.

To ensure consistency of exit

Tags to mark

Ensure the variable of output, using the process when my llm answers in the klup tags. I use System short like:

prompt = f"""
Classify the text into "Cat" or "Dog"

Provide your response in   tags

"""

And the model will probably always answer:

Cat

or 

Dog

You can now notice the answer using the following code:

def _parse_response(response: str):
    return response.split("")[1].split("")[0]

The reason we use tagging tags in progress is so effective that this model is trained in order to behave well. When online, QWEN, Google, and others train these types, using marking tags. Models are very effective in using these tags and will, in almost all cases, comply with the expected response form.

For example, in the model models, they were in recent resurrection, models began in … Tags, and give their feedback to the user.

In addition, I try again to use many marks as much as I can somewhere in my institutions. For example, if I provide a few examples of shooting in my model, I will do something like this:

prompt = f"""
Classify the text into "Cat" or "Dog"

Provide your response in   tags


This is an image showing a cat -> Cat


This is an image showing a dog -> Dog

"""

I do two things that help the model do here:

I give examples to Tags.
In my examinations, I confirm sticking to my expected response format, using

Using Martup tags, you can confirm the high level of transformation from your llm

The approval of the output

Pydantic is a tool you can use to verify and verify your llms result. You can describe the types and ensure that the model effect attaches to the expected list. For example, you can follow the example below, based on this article:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()


class Profile(BaseModel):
    name: str
    email: str
    phone: str

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Return the `name`, `email`, and `phone` of user {user} in a json object."
        },
    ]
)

Profile.model_validate_json(resp.choices[0].message.content)

As you can see, we are quick to answer something JSON, and we then run to the pydantic to make sure the answer is that we expect.

I would also be aware that sometimes easy is it to just build your output function. In the last example, the only requirements of an object of actually responding actually is rThe in Spottle item contains the keyword, email, and phone, and all that is a string type. You can confirm this in Python for work:

def validate_output(output: str):
    assert "name" in output and isinstance(output["name"], str)
    assert "email" in output and isinstance(output["email"], str)
    assert "phone" in output and isinstance(output["phone"], str)

By this time you need not install any packages, and in many cases, it is easy to set up.

The detection of the system immediately

You can make other other other times on your system immediately to ensure more reliable exit. I always recommend doing your faster as set as set as possible, using:

Tags to mark as mentioned before
List, as something I write here

Often, you should also confirm clear orders. You can use the following to verify your Prompt quality

If you gave the propt to someone else, who had never seen this work before, and there is no previous information. Will a person be able to do the work successfully?

If you can't have a job someone, you usually don't expect AI to do it (at least yet).

Treating errors

Errors are not avoided when dealing with llms. If you make enough API calls, you are probably sure that sometimes the answer will not be in your desired format, or other problem.

In these cases, it is important that you have a solid system equipped to treat such mistakes. I use the following strategies to manage errors:

Retry Policy
Increase the temperature
Have a well-free backup

Now, let me explain in each point.

Exportial Backoff Restry

It is important to have a way of trying and a place, considering many challenges may arise when it costs API. You may encounter problems such as prices, the wrong output, or slow answer. In these cases, you must ensure the threat of a llym Call on holding when you try again. Usually, and it is wise to use the exponential Backoff, especially limit error errors. The reason for this is to make sure that you wait long enough to avoid minimize the infrastructure.

Increasing temperature

And sometimes I recommend to increase the temperature slow. If you set a temperature on 0, you tell the model to do it by seeing. However, sometimes this can have a negative impact.

For example, if you have an example of installing when a failed model is answered in the correct format of opt-out. If you try again this is using 0 temperatures, you may simply get the same problem. That way I recommend that you set the temperature up, for example 0.1, to ensure some stability in the model, while confirming its results.

This is similar to what is used for many agents: high temperature.

They need to avoid toughness in loop. Having high heat can help them to avoid repeating errors.

Backup of llms

Another powerful way to deal with the errors is to have a balup llMS. I recommend using the LLM suppliers in all your API calls. For example, first try to openai, if that fails, using Gemini, and if that fails, you can use Claude.

This guarantees reliability at an event of the providable issues. These can be issues such as:

The server is the floor (for example, if an OPE of Openai is not available for time)
Sorting (sometimes, the LLM provider

Usually, it is simply a good way to be completely dependent on one provider.

Store

In this article, I discussed how you can confirm your credibility in your LLM program. LLM applications are indochastic naturally because you can not directly control the result of the llm. It is therefore important to ensure that you have the right policies in the area, both to reduce the errors that occur and treat errors when they occur.

I discussed the following ways to minimize errors and manage errors: