Machine Learning

Run Local LLM with OpenClaw on Your Mac Mini

Bought a Mac Mini for Openclaw. It's complete.

Of late, Anthropic has pushed OpenClaw users to its token payment API1turning what was once a one-time hardware purchase into an ongoing (huge) expense.2. Even if you use OpenAI, you will still pay less every month.

💵💵 Using the local model eliminates the monthly cost of your OpenClaw agents, completely. 💵💵

However, installation and configuration can all be confusing, especially if you are new to local LLMs.

In this article, I'll show you how to create a local LLM (in a very painless way) on your Mac Mini that can power your agent for free.

You can use it even if you are a beginner.

🤨 “I heard that local LLMs don't work, is that true?”

Local LLM (proper setup) will do almost inseparable for functions such as emails, calendar management, reminders, IoT home automation and basic internet research (things you actually do with OpenClaw).

If you need to do something more advanced, such as using OpenClaw for software engineering, there is a link below that highlights how to set up a fallback model.

⚠️Note: This guide is not a complete OpenClaw tutorial.

It is intended to help you get your local LLM up and running with your agent as quickly as possible.

Computer hardware

This article was tested on a Mac Mini with the following details

OS macOS Tahoe
Version 26.3.1
The processor M2
The cores 8
Integrated Memory 24GB

If you are thinking of buying a Mac Mini, I would recommend at least an M2+ processor with at least 24GB of RAM. You can get away with 16GB, however, things will be more complicated and you may encounter errors with larger instances.

Setting things up

First, install OpenClaw using the official guide. If you have already done this, skip this step.

1. Install llama.cpp

We are going jump using Ollama (recommended local provider), and select llama.cpp. Using the estimated model and llama.cpp, we can speed up prediction by 70%

We need to build llama.cpp from source with metal flags on again until off. This handles some of the configuration required to run the model on your Mac at full speed. Just follow the steps below.

1️⃣ First, in your home directory, enter some requirements using brew.

# paste this into your terminal
$ brew install cmake curl

2️⃣ Then, build llama.cpp with the appropriate flags

# Clone llama.cpp
git clone 

# Configure build with Metal acceleration
cmake llama.cpp -B llama.cpp/build 
    -DBUILD_SHARED_LIBS=OFF 
    -DGGML_METAL=ON 
    -DGGML_CUDA=OFF

# Build
cmake --build llama.cpp/build 
    --config Release 
    -j$(sysctl -n hw.ncpu) 
    --clean-first 
    --target llama-cli llama-mtmd-cli llama-server llama-gguf-split

Now, we have llama.cpp available for you to use

2. Download a local LLM

As already mentioned, the key to getting good performance is in the local model quantization.

Standardization allows us to use a larger, more capable model, intelligently “compressed” to fit smaller hardware. This allows the scaled model to retain much of the performance of its full-size source model.

Unless you have a large GPU or a Mac with a high amount of integrated memory (80GB+ VRAM) quantizing is required

Blindly following the OpenClaw documentation while trying to use the limited model will leave you confused and frustrated.

There is no guidance available that clearly explains how multi-valued models work with agents.

Below is a checked recipe that will work for your agent.

Model Selection: Qwen 3.5-9B

Here we use Qwen 3.5 (parameter version 9B).

As of June 2026, it is the leading player in local models, outscoring Gemma 4-12B. This will fit on either a 16GB or 24GB Mac with a total of 6-8GB of RAM required. Users also rate this very high in OpenClaw.

And remember that agents need long content, which will prevent us from using the large version of 27B, even for moderation.

1️⃣ Let's download the model

# download model
 curl -L -o models/Qwen3.5-9B-UD-Q4_K_KL.gguf 
"

2️⃣ Download the template, save it to the templates.

mkdir templates && 
curl -o templates/qwen35.jinja 
"

Importantly, you must use a template that is compatible with the OpenClaw agent. Without this step, nothing will work.

3. Start llama-server

Llama-server will serve as our backend API. OpenClaw will use this web service instead of calling the API from OpenAI or Anthropic directly.

We have already installed llama-server, and downloaded our model. Let's start a quick test.

1️⃣ Do a quick test

 ./llama.cpp/llama-server 
  -m models/Qwen3.5-9B-UD-Q4_K_XL.gguf 
  --chat-template-file templates/qwen35.jinja 
  --temp 0.7 
  --top-p 0.9 
  --top-k 20 
  -c 64000 
  -ngl 20 
  --host 127.0.0.1 
  --port 8080

You should see something like that (without the errors)

 srv  llama_server: model loaded
 llama_server: server is listening on 
 update_slots: all slots are idle

2️⃣ Now, let's write a launchd daemon, so that your local LLM server starts automatically and remains available after a reboot. If you are familiar with Linux, launchd actually systemd for macOS

Save the following as /Library/LaunchDaemons/com.openclaw.llama-server.plist. You will need to use sudo of this.

Expand this to plist file

❗Make sure you change YOUR_USERNAME with your actual username in the xml.








Label
com.openclaw.llama-server

UserName
YOUR_USERNAME

ProgramArguments

    /Users/YOUR_USERNAME/llama.cpp/llama-server

    -m
    /Users/YOUR_USERNAME/models/Qwen3.5-9B-UD-Q4_K_XL.gguf

    --chat-template-file
    /Users/YOUR_USERNAME/templates/qwen35.jinja

    --temp
    0.7

    --top-p
    0.9

    --top-k
    20

    -c
    64000

    -ngl
    20

    --host
    127.0.0.1

    --port
    8080


WorkingDirectory
/Users/YOUR_USERNAME

RunAtLoad


KeepAlive


StandardOutPath
/tmp/llama-server.log

StandardErrorPath
/tmp/llama-server.err


Now, allow it.

sudo chown root:wheel /Library/LaunchDaemons/com.openclaw.llama-server.plist && 
sudo chmod 644 /Library/LaunchDaemons/com.openclaw.llama-server.plist && 
sudo launchctl bootstrap system /Library/LaunchDaemons/com.openclaw.llama-server.plist

We can check to see if the service is working properly by adding a tail to our log file

tail -f /tmp/llama-server.err

Now we both have our local LLM loaded, effectively running as a daemon. What we need to do now is to reconfigure OpenClaw.

4. Reconfigure OpenClaw to use the local model

Now we need to add this local model to our OpenClaw setting to make it usable in our gateway.

1️⃣ Add to the “models” block in .openclaw/openclaw.json

{
  "models": {
    "providers": {
      "local": {
        "baseUrl": "/v1",
        "apiKey": "sk-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3-9b",
            "name": "Qwen3.5 9B Local",
            "contextWindow": 64000,
            "maxTokens": 8192
          }
        ]
      }
    /* REMOVE THIS COMMENT */
    /* you may add additional providers, like anthropic here */ 
    }
  }
}

Be careful: settings for contextWindow again maxTokens it may need to be adjusted for your specific application

You will also need to set up a default model for your agents

"agents": {
    "defaults": {
      "model": {
        "primary": "local/qwen3-9b"
      },
      "models": {
        "local/qwen3-9b": {}
      }
 }

It is also useful to verify that the config is correct, use the command below to check the syntax

openclaw config validate

2️⃣ Restart the gateway, making sure that the local model is now available

openclaw gateway restart

3️⃣ Check to see if OpenClaw has correctly registered our local model

openclaw models list --provider local

And we can use a simple index call

openclaw infer model run 
  --model local/qwen3-9b 
  --prompt "Reply with exactly: pong" 
  --json

You should get a JSON object in return. Important: make sure you don't have any leaked tags in the response. It shouldn't, but this is doubly important for security

{
  "ok": true,
  "capability": "model.run",
  "transport": "local",
  "provider": "local",
  "model": "qwen3-9b",
  "attempts": [],
  "outputs": [
    {
      "text": "pong",
      "mediaUrl": null
    }
  ]
}

Now we have confirmed that all the pipes are working properly. To be absolutely sure (or if this is your first agent), let's set up a sample of the skill, and make sure that the model sets the reasons correctly and makes the tool call as expected.

5. Validate performance with test skills

Let's create a test capability for 'python-calc', which will allow us to check if our local model can correctly consult and issue tool calls.

1️⃣ Run this to create the skill. This will add this tool to all your openclaw agents.

mkdir -p ~/.openclaw/workspace/skills/python-calc

cat << 'EOF' > ~/.openclaw/workspace/skills/python-calc/SKILL.md
---
name: python-calc
description: A tool that evaluates mathematical expressions by executing a Python one-liner.
version: 1.0.0
---
## Instructions
1. Extract the exact mathematical expression the user wants to calculate.
2. Use your built-in shell tool to run this exact command, replacing `` with the expression: `python3 -c "print()"`
3. Wait for the shell tool to return the stdout output.
4. You MUST generate a final conversational response to the user containing the exact numeric result returned by the script.
EOF

Again, restart the gateway.

2️⃣ Now, we can run the sample agent call to ensure the tool exits correctly:

openclaw agent --local --agent main --verbose on --thinking high --message 
"Use the python-calc skill to calculate 8664 multiplied by 222. 
Do not use skill_workshop. Tell me the final answer."

And, after a second or so, if everything is working fine, we should do something like this:

The final answer is 1,923,408.

It's delicious!

In fact, we can see speeds of up to 20-70 tokens per second*. Although this is not Claude's speed (130 tps+), this is quite reasonable for an OpenClaw agent with minimal hardware.

Remember, the thinking mode is set to high, so it's okay if it takes a while.

If you are not sure whether openclaw is using your model or not, in another terminal window, fill the llama-server log by running tail -f /tmp/llama-server.err

*Your actual speeds may vary

Wrapping up

Using a local LLM, especially with customized templates and sizing, can be very frustrating. Setting this up for the first time on a friend's Mac took 2 days back and forth! Thanks to Jacob W. for the inspiration.

That's all! I hope this saves you a lot 💸

If it does, or if I save you a headache, you can also buy me a coffee here.

☕ Hello!

1 Tweet by Boris Cherny, discussing the “ban” of OpenClaw

2 User spends $420 per month on API fees

3 Using multiple providers with OpenClaw

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button