How to Make Claude Code Validate Your Own Work

0 1 6 minutes read

How to Make Claude Code Validate Your Own Work

the most powerful model out of the box. To use its full potential, however, you need to give it access to verify and validate its work.

In a previous article, I talked about Claude validating his work as an important part of how I am expanding my use of Claude's Code. In this article, however, I will go deeper into how I got Claude to confirm his work.

The benefits are amazing. When you make Claude confirm his work, you get:

Better model for one-shot use (spends less time repeating)
A model that can work for a long time (the model continues until it can successfully confirm its operation)
A model can complete a complex task

I will go deeper into some of the tasks when I ask Claude to verify his work, where I save a lot of time. I will recap my thought process when setting up Claude this way.

In this article I will discuss how to allow Claude's code to validate its function to increase performance. Image via ChatGPT.

Why should you make Claude confirm his work?

The first reason you should have Claude certify his work is that it simply makes Claude perform better. You can imagine this with the following scenario:

Imagine you had to use a piece of code to calculate the Fibonacci sequence. Obviously, other people have done this exact task before, and it will be easy for them to do it. However, consider that you have to complete this task completely without having a chance to run the code and see the output, that is, you have to create the complete code on your first attempt at the problem. So, naturally, this is more difficult than if you get a chance to test the code yourself, fix it if you see that it doesn't produce the right numbers, and continue like that until your piece of code produces the right output.

The same concept applies to the Claude Code. If you don't give it a chance to verify its work, it's like asking it to code the Fibonacci sequence without letting it see the output of the code. Obviously, you are putting Claude Code at a disadvantage where it will produce lower results compared to when Claude Code gets a chance to test its code.

How to make Claude confirm the work by working

The words “make Claude confirm his job”, are often thrown around, for example on LinkedIn and X. However, I see that few people explain exactly how they do it themselves, which makes it difficult for others to replicate.

Therefore, I will include real-world examples of how I made Claude validate his work. I will include the process from:

Hearing about a problem
Understanding what is causing the problem
Using the solution with Claude and making sure it can confirm its function

Long LLM processing times

My first concrete example is a situation where I was analyzing user data in an interaction with an AI chat agent. After the interview, I have to process the interview, such as downloading the transcript and doing segmentation and extracting data from the transcript.

I started investigating the problem by reproducing it and running LLM processing on the same thread multiple times, and noticed how long it took. It turned out that the average time was acceptable, about 30 seconds, but almost every tenth time, the processing time would be more than two minutes, which, in fact, is not acceptable at all. I explained the situation to Claude Code and asked him what could be causing this issue.

The most likely reason, it turned out, was that I was simply putting in too many tokens and taking out too many tokens, which in some cases took a lot of time to generate. So, the solution was to take this one LLM call and split it three times to make the number of output tokens it had to produce fewer, so it would work in parallel.

This is an example of a complete function where Claude Code can confirm its function:

The perfect job to validate your work is a job where you have a known expected result that you want to produce and you can keep working and iterating on the problem until you reach that result.

This is good because all I have now is the number of input tokens used, and the expected output, which is what I expect if I do everything in one LLM call. And I would just ask Claude Code to split the LLM call into three pieces and make sure you did it correctly, compare the result from the split LLM calls against one monolithic LLM call, they are almost exactly the same (not exactly the same because LLMs are stochastic)

I updated my Claude code example with all this information. It kept repeating its code until it confirmed that the results were the same, and it successfully fired one problem, returning to me with a successful solution.

Designing a web page

The last example I gave was good because it is very easy for LLM or Claude Code to verify the results. It can easily make an API call, compare the output, and see if it's correct.

However, what happens when the output you want to produce is an idea?

My second example involves a problem where I had a design for what a web page should look like, and I wanted Claude Code to generate that exact design. Of course, given the framework of the application and the existing codebase it was written for.

This may sound like a daunting task because it involves looking at results. Fortunately, we have Claude in Chrome, which is an MCP where you can give Claude access to your Google Chrome and let it check the results.

So I was given a screenshot of the design of what the page should look like, including how the page was organized into different sections and the color scheme used in the design.

This job is straightforward. I simply provided Claude Code screenshots and asked him to implement the design. If your design is simple, this may come out of the box. However, some complex designs are difficult to one-shot, especially if you are doing it on a large existing codebase with many dependencies and design conventions.

So, to give Claude Code the best chance to shoot the problem itself, I gave it access to Google Chrome. If you want to set this up yourself, you can just ask your Code Claude instance, How do I give you access to Google Chrome?

I instructed my Claude agent to first try to use the design, then I go into Google Chrome, load the appropriate page after searching the servers, of course, taking a screenshot and comparing the designs. If it sees any inconsistencies, it must continue to iterate until the designs are nearly identical.

In addition, I asked my agent to inform me of any conflict between the two designs if it is impossible to use something or if it is not clear how to do something. This is a good strategy because it makes Claude come to you with questions instead of you teaching Claude everything about the design. Overall, this is a great strategy for working better with your coding agents.

The conclusion

In this article, I've covered how to make Claude Code validate its work, to greatly improve the performance of your Claude Code instance or coding agent in general. I discussed why it is so important to highlight that allowing Claude to validate his work simply makes it perform much better with a higher success rate in single-shot deployments, allowing the agent to work longer, and still complete tasks effectively. I put together two specific situations I was put in when I gave Claude Code access to verify his work, including splitting the LLM call into three separate calls to improve latency and following the designs created for the web page and using it in my application. Both of these are specific situations in which I have successfully allowed Claude to validate his work and increase its effectiveness.

👋 Touch

👉 My Free eBook and Webinar:

🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)

📚 Get my free ebook Vision Language Models

💻 My webinar on Vision Language Models

👉 Find me on social media:

💌 Stack

🔗 LinkedIn

🐦 X / Twitter

Source link

nimda 19 hours ago

0 1 6 minutes read