Machine Learning

Build Your Own Local Code AI Agent with Gemma 4 and OpenCode

they are now part of the normal development process.

Many people use them with cloud-managed models, as they are simple, and more capable models can be used.

But when it comes to cost control, or if you don't want to send your code to the cloud for privacy concerns, or you're trying and want to better understand how the agent stack actually works, you might want to try an on-premises setup.

This is what this post is about. Here, we will set up a local code agent with three pieces:

  • Ollamaserving as a model;
  • Gemma 4as a local LLM;
  • OpenCodesuch as the agent interface.

Finally, we will connect OpenCode to the local LLM.

Figure 1. General structure. (Photo by author)

1. Enter Ollama

We start by installing Ollama, which will work for the Gemma 4 model in place.

If you've never used it before, Ollama is a runtime for downloading, running, and rendering locale models on your machine. Once set, Ollama exposes the API endpoint. This way, other tools (eg, OpenCode) can talk to the model directly.

On Windows machines, you can do that from the official installer:

Alternatively, you can also install it in PowerShell by using winget:

winget install Ollama.Ollama

After installation, you should be able to see Ollama in the Windows Start menu. You can launch it like any other app. Once it is running, you should see the Ollama icon in the system tray, and this means that the local Ollama service is running in the background.

Figure 2. Ollama App interface. (Photo by author)

In addition, you can open a new PowerShell window and check if the Ollama CLI is available:

ollama --version

If you are on a Linux machine, you can install Ollama with:

"curl ‒fsSL  | sh"

After installation, check if Ollama is available:

ollama --version

Once Ollama is installed, it uses a local server on your device. Later, OpenCode will talk to this local Ollama server instead of calling the cloud model provider.


2. Download Gemma 4

Next, we prepare the local LLM. For this post, we will use Gemma 4.

Gemma 4 is a new open model released by Google on April 2, 2026. This model is designed for reasoning, coding, multimodal understanding, and agent workflow.

It comes in many sizes, including a smaller version for the edge and a larger version that focuses on the work area. Since this post is about running the model locally on a laptop, we will set up a variant that is suitable for the limit, i.e., E2B (gemma4:e2b) and E4B (gemma4:e4b) is different.

In the name of Ollama, i E represents “effective” parameters.

For this walkthrough, I use the E4B model as it offers more power. In PowerShell:

ollama pull gemma4:e4b

On Linux, use the same command:

ollama pull gemma4:e4b

You can check the downloaded model:

ollama list

On my machine, Ollama reports the following:

gemma4:e4b    9.6 GB

For reference, my laptop has an Intel i7-13800H CPU, 32 GB RAM, and an NVIDIA RTX 2000 Ada Laptop GPU with about 8 GB VRAM. You can choose gemma4:e2b instead if the E4B sounds too slow.

A few technical notes here. Version of gemma4:e4b which we downloaded earlier is a 4-bit quantized model, which has GGUF such as the spatial model format used by Ollama for runtimes. On my machine, Ollama reports gemma4:e4b supports up to 128K core length.

Before moving on to the next step, we can do a quick test:

ollama run gemma4:e4b "what's the capital of France?"

If you get “Paris” back, congratulations, Gemma 4 is now available on your local machine via Ollama.

Note that the first call may be slow because Ollama has to load the model. Once the model is warm, the following commands should respond quickly.


3. Enter OpenCode

Next, we need an agent interface. We will use OpenCode for that.

If you've used tools like Claude Code or Codex, OpenCode belongs in the same broad category. You can think of it as an agent runtime that can run within a local repo, inspect files, run commands, and perform various tasks.

A key difference that is important to us is that OpenCode is open source and agnostic about LLM providers. You can connect it to cloud models (eg, Claude/GPT/Gemini models), or you can connect it to the local model provided by Ollama.

That's exactly what we're going to do here.

If you're on a Windows machine, you'll need to first install Node.js. You can do that by using:

winget install OpenJS.NodeJS.LTS

On Linux, you can:

sudo apt update
sudo apt install -y nodejs npm

After installation, you should open a new PowerShell window and ensure that both node and npm are available:

node --version
npm --version

Now we can install OpenCode:

npm install -g opencode-ai

Then confirm the installation:

opencode --version

At this point, OpenCode is included. You can simply launch a working OpenCode TUI (terminal UI) from any project folder by running:

opencode
Figure 3. OpenCode TUI. (Photo by author)

4. Connect OpenCode to Gemma 4

By default, OpenCode doesn't know which model we want to use. Therefore, we need to point it to the Gemma 4 model, provided by Ollama.

Let's first create an Ollama model tag with the full context window (128K) enabled. This is important because we want to make sure that the agent can work properly without context reduction.

We can do that with a little Ollama Modelfile. Specifically, we can create a file called gemma4-e4b-128k.Modelfile in the folder/repo we want to work with:

FROM gemma4:e4b
PARAMETER num_ctx 131072

Then, in the command line, we create a new Ollama tag with:

ollama create gemma4:e4b-128k -f gemma4-e4b-128k.Modelfile

Something to point out: this will not trigger downloads for the new model! It just builds an Ollama profile that uses the same Gemma 4 E4B model, but explicitly sets the runtime context window to 128K.

OK, we can continue to connect OpenCode to the Gemma 4 model. For that, we need to create a opencode.json file in the project folder:

{
  "$schema": "
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "
      },
      "models": {
        "gemma4:e4b-128k": {
          "name": "Gemma 4 E4B 128K"
        }
      }
    }
  },
  "model": "ollama/gemma4:e4b-128k"
}

Two important parts here:

First, OpenCode talks to Ollama using a local OpenAI compatible endpoint:

Second, note that we name the model following the OpenCode provider/model format:

ollama/gemma4:e4b-128k

You are using our newly created model tag above.

Now, if you launch OpenCode in the same project folder by using:

opencode

You have to see gemma4:e4b-128k in the list.

Figure 4. OpenCode linked to Gemma 4 spatial model. (Image by author)

Now we are all set!


5. What Can You Do With This Setup?

With OpenCode TUI introduced, you can test your setup by asking the agent to perform a few tasks. For example, you can ask the agent to write a README file, define certain functions, create test scripts, etc.

In fact, beyond coding, you can also ask the agent to perform many work environment tasks, such as file manipulation, content extraction, and more.

OpenCode also gives you room to expand the setup. You can also connect tools to an agent, install agent skills with SKILL.md, and define special agents with AGENTS.md.

In addition, you can run tasks from the command line with:

opencode run "Summarize this repository."

To use the program more, OpenCode can also work as a server, so TUI is not the only interface.

And here's the most important thing: all your data remains fully local.

You can find the relevant OpenCode documentation here:

The CLI:

Skills:

MCP:

Server mode:


Reference

[1] Gemma Documents:

[2] Ollama Documents:

[3] OpenCode documentation:

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button