Machine Learning

How To Use Claude Code In Your Browser

about coding agents that they can only be used to perform coding or programming. However, they are very general agents and can perform basically all office functions, although with varying degrees of success.

Another area, however, that has received a lot of attention is browsing using web browsers with coding agents such as Claude Code and OpenAI's Codex.

Agents have an amazing ability to navigate the web, which is very useful for many different tasks.

Browsing the web can be useful in many different situations, such as downloading information from the Internet or filling out forms. However, it is important to note that some use cases may violate the terms of service, so you should be aware of this. The main use area I'll cover today is completely legal, and includes self-developing navigation applications with coding agents to test and validate implementations.

Earlier, I talked a lot about creating actions that can be verified whenever you ask coding agents to perform actions for you. Giving code agents access to your browser to test usage is an important part of this verification.

This infographic highlights the main idea or topic of this article. I will discuss how to give your code agent access to the browser to make it more powerful. I'll discuss why the code agent needs access to the browser, the loop you should set up, and how to use this browser access to make the agent authenticate its work. Image via ChatGPT.

Why code agents should use your browser

First, I'd like to cover why you should care about using browsers with your coding agents. Browsers are an important interface that people use to interact with the world. With your browser, you can perform many different actions, such as reading information, completing applications, and more.

Given that this is such an important interface for people to interact with the world, a lot of attention and research has been directed toward navigating browsers effectively. There are many companies out there that specialize in browser navigation, and all the frontier labs offer such integration in their products, such as OpenAI's Codex and Anthropic's Claude Code.

Imagine if you tell a code agent to use a design that follows an HTML design file. The code agent, of course, is good at the code end and can start using it immediately; however, if the code agent cannot navigate the browser, it is impossible for the code agent to verify its operation.

This greatly increases the chance that the code agent will make mistakes and not implement the exact design you wanted to implement.

Fortunately, there is a very easy fix for this problem. Give your coding agent access to the browser. Allow it to take screenshots of the design it created and compare it to the screenshots of the design you want to use. The code agent can continue to iterate until the code used is exactly the same as the design file.

This saves you, as a programmer, a lot of time as you don't have to repeatedly verify and instruct the code agent about the mistakes made when doing the design implementation. This also allows you to multitask and be more productive as a developer.

How does this work

Before moving on to how to navigate browsers with Claude Code, I want to have a simple section covering how it works.

In theory, it is very easy to navigate in the browser. The code agent navigates by opening a browser, of course, where it has access to several actions:

  • Take a screenshot
  • Click (combination based)
  • Enter the text

These are the three main actions performed by the coding agent, which are all the actions you need to interact with the browser:

  1. The code agent needs to take screenshots because that's how it finds out what's on each page and where to click.
  2. The code agent also needs to be able to click different places on the website, for example, click buttons or click input fields.

This is based on communication.

So when the code agent wants to click somewhere, it outputs the following script:

click(x=0.754, y=0.328)

It basically uses the click function and provides links where it wants to click. Coordinates are usually normalized to a set range, between 0 and 1.

Then, when the agent clicks on a certain area, they can enter a script to do everything they want to do in the browser. The code agent, of course, can perform different types of clicks, such as right-clicking for more options on a page.

This loop then repeats. The code agent takes a screenshot, chooses which action to perform, checks whether it has achieved its goal or not, and repeats. It takes a screenshot again, selects an action, checks if it accomplished the goal, and continues. The agent simply continues like this until it reaches its destination in the browser.

How to navigate browsers with Claude Code

Next, I want to cover exactly how to navigate browsers using Code Claude, and the principles I'll cover here basically apply to any code agent. I will not include techniques that cannot be easily implemented in any other coding agent.

First, if you're using Claude's Code, it has a built-in Chrome integration that you can enable by typing the command below while in the Claude's Code window.

/chrome

The Codex also has an accompanying command.

This very simple gives Claude access to open Chrome on your computer and use it to verify tasks.

I think the Chrome implementation on Claude works fine, but it's not perfect.


I have a better experience using Playwright MCP, which you can just install in Claude Code by telling Claude Code to install:

Install the Playwright MCP to interact with the browser

After Claude has installed it, you need to restart Claude's Code, and you will be able to access the Playwright MCP. In my experience, Claude is more successful in completing tasks when using the Playwright MCP instead of interacting with the /chrome implementation that is already present in the baseline Code Claude.

Of course, if you have another code agent, you can do exactly the same thing: tell it to install Playwright MCP. The agent will install MCP, and you will restart the agent, and you will have access to Playwright.

How do I get my agent to check my usage

Now that you've used Playwright MCP and given your agent access to communicate with the browser, you can use it to test your usage.

Whenever your agent commits something (for example, creates a new design from a design file), simply tell the agent to validate its work at the end by going through it in Chrome with Playwright MCP and validating its own work.

It also helps to tell the agent not to stop and get back to you before they finally confirm their work. End-to-end job verification, in this case, means actually communicating with the browser and seeing if something works.

I usually use the /goal feature, which is found in both Codex and Claude Code, which is how an agent keeps working on a task until it's done. I would then write something like this:

/goal continue working on the task, implementing  until you've 
fully implemented it and tested and verified it end to end by interacting
with the browser using the playwright MCP, taking screenshots, and
verifying your work, only come back to me once you've both implemented 
and fully tested the implementation successfully. 

This will enable the agent to continue working on the goal and verify it, and only return to you once it has verified its work. This saved me a lot of time and is very useful if you only want the agent to use the designs.

The conclusion

In this article, I have included how to use Code Claude to verify the activity in your browser. I first discussed why coding agents can and should interact with your browser. Then I took it to how browser navigation actually works with coding agents, which is a great idea. Finally, I went directly into how to navigate browsers using Claude Code or other coding agents.

I believe that browser navigation will remain important because most of the ways people interact with the world are through the browser. However, it's worth noting that coding agents are still very active in using APIs and MCPs, so if you can't interact with the service in those ways instead, you should always do so.

👋 Touch

👉 My Free eBook and Webinar:

🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)

📚 Get my free ebook Vision Language Models

💻 My webinar on Vision Language Models

👉 Find me on social media:

💌 Stack

🔗 LinkedIn

🐦 X / Twitter

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button