Run Local LLM with OpenClaw on Your Mac Mini

Bought a Mac Mini for Openclaw. It's complete.
Of late, Anthropic has pushed OpenClaw users to its token payment API1turning what was once a one-time hardware purchase into an ongoing (huge) expense.2. Even if you use OpenAI, you will still pay less every month.
💵💵 Using the local model eliminates the monthly cost of your OpenClaw agents, completely. 💵💵
However, installation and configuration can all be confusing, especially if you are new to local LLMs.
In this article, I'll show you how to create a local LLM (in a very painless way) on your Mac Mini that can power your agent for free.
You can use it even if you are a beginner.
🤨 “I heard that local LLMs don't work, is that true?”
Local LLM (proper setup) will do almost inseparable for functions such as emails, calendar management, reminders, IoT home automation and basic internet research (things you actually do with OpenClaw).
If you need to do something more advanced, such as using OpenClaw for software engineering, there is a link below that highlights how to set up a fallback model.
⚠️Note: This guide is not a complete OpenClaw tutorial.
It is intended to help you get your local LLM up and running with your agent as quickly as possible.
Computer hardware
This article was tested on a Mac Mini with the following details
| OS | macOS Tahoe |
| Version | 26.3.1 |
| The processor | M2 |
| The cores | 8 |
| Integrated Memory | 24GB |
If you are thinking of buying a Mac Mini, I would recommend at least an M2+ processor with at least 24GB of RAM. You can get away with 16GB, however, things will be more complicated and you may encounter errors with larger instances.
Setting things up
First, install OpenClaw using the official guide. If you have already done this, skip this step.
1. Install llama.cpp
We are going jump using Ollama (recommended local provider), and select llama.cpp. Using the estimated model and llama.cpp, we can speed up prediction by 70%
We need to build llama.cpp from source with metal flags on again until off. This handles some of the configuration required to run the model on your Mac at full speed. Just follow the steps below.
1️⃣ First, in your home directory, enter some requirements using brew.
# paste this into your terminal
$ brew install cmake curl
2️⃣ Then, build llama.cpp with the appropriate flags
# Clone llama.cpp
git clone
# Configure build with Metal acceleration
cmake llama.cpp -B llama.cpp/build
-DBUILD_SHARED_LIBS=OFF
-DGGML_METAL=ON
-DGGML_CUDA=OFF
# Build
cmake --build llama.cpp/build
--config Release
-j$(sysctl -n hw.ncpu)
--clean-first
--target llama-cli llama-mtmd-cli llama-server llama-gguf-split
Now, we have llama.cpp available for you to use
2. Download a local LLM
As already mentioned, the key to getting good performance is in the local model quantization.
Standardization allows us to use a larger, more capable model, intelligently “compressed” to fit smaller hardware. This allows the scaled model to retain much of the performance of its full-size source model.
Unless you have a large GPU or a Mac with a high amount of integrated memory (80GB+ VRAM) quantizing is required
Blindly following the OpenClaw documentation while trying to use the limited model will leave you confused and frustrated.
There is no guidance available that clearly explains how multi-valued models work with agents.
Below is a checked recipe that will work for your agent.
Model Selection: Qwen 3.5-9B
Here we use Qwen 3.5 (parameter version 9B).
As of June 2026, it is the leading player in local models, outscoring Gemma 4-12B. This will fit on either a 16GB or 24GB Mac with a total of 6-8GB of RAM required. Users also rate this very high in OpenClaw.
And remember that agents need long content, which will prevent us from using the large version of 27B, even for moderation.
1️⃣ Let's download the model
# download model
curl -L -o models/Qwen3.5-9B-UD-Q4_K_KL.gguf
"
2️⃣ Download the template, save it to the templates.
mkdir templates &&
curl -o templates/qwen35.jinja
"
Importantly, you must use a template that is compatible with the OpenClaw agent. Without this step, nothing will work.
3. Start llama-server
Llama-server will serve as our backend API. OpenClaw will use this web service instead of calling the API from OpenAI or Anthropic directly.
We have already installed llama-server, and downloaded our model. Let's start a quick test.
1️⃣ Do a quick test
./llama.cpp/llama-server
-m models/Qwen3.5-9B-UD-Q4_K_XL.gguf
--chat-template-file templates/qwen35.jinja
--temp 0.7
--top-p 0.9
--top-k 20
-c 64000
-ngl 20
--host 127.0.0.1
--port 8080
You should see something like that (without the errors)
srv llama_server: model loaded
llama_server: server is listening on
update_slots: all slots are idle
2️⃣ Now, let's write a launchd daemon, so that your local LLM server starts automatically and remains available after a reboot. If you are familiar with Linux, launchd actually systemd for macOS
Save the following as /Library/LaunchDaemons/com.openclaw.llama-server.plist. You will need to use sudo of this.
Expand this to plist file
❗Make sure you change YOUR_USERNAME with your actual username in the xml.
Label
com.openclaw.llama-server
UserName
YOUR_USERNAME
ProgramArguments
/Users/YOUR_USERNAME/llama.cpp/llama-server
-m
/Users/YOUR_USERNAME/models/Qwen3.5-9B-UD-Q4_K_XL.gguf
--chat-template-file
/Users/YOUR_USERNAME/templates/qwen35.jinja
--temp
0.7
--top-p
0.9
--top-k
20
-c
64000
-ngl
20
--host
127.0.0.1
--port
8080
WorkingDirectory
/Users/YOUR_USERNAME
RunAtLoad
KeepAlive
StandardOutPath
/tmp/llama-server.log
StandardErrorPath
/tmp/llama-server.err
Now, allow it.
sudo chown root:wheel /Library/LaunchDaemons/com.openclaw.llama-server.plist &&
sudo chmod 644 /Library/LaunchDaemons/com.openclaw.llama-server.plist &&
sudo launchctl bootstrap system /Library/LaunchDaemons/com.openclaw.llama-server.plist
We can check to see if the service is working properly by adding a tail to our log file
tail -f /tmp/llama-server.err
Now we both have our local LLM loaded, effectively running as a daemon. What we need to do now is to reconfigure OpenClaw.
4. Reconfigure OpenClaw to use the local model
Now we need to add this local model to our OpenClaw setting to make it usable in our gateway.
1️⃣ Add to the “models” block in .openclaw/openclaw.json
{
"models": {
"providers": {
"local": {
"baseUrl": "/v1",
"apiKey": "sk-local",
"api": "openai-completions",
"models": [
{
"id": "qwen3-9b",
"name": "Qwen3.5 9B Local",
"contextWindow": 64000,
"maxTokens": 8192
}
]
}
/* REMOVE THIS COMMENT */
/* you may add additional providers, like anthropic here */
}
}
}
Be careful: settings for
contextWindowagainmaxTokensit may need to be adjusted for your specific application
You will also need to set up a default model for your agents
"agents": {
"defaults": {
"model": {
"primary": "local/qwen3-9b"
},
"models": {
"local/qwen3-9b": {}
}
}
It is also useful to verify that the config is correct, use the command below to check the syntax
openclaw config validate
2️⃣ Restart the gateway, making sure that the local model is now available
openclaw gateway restart
3️⃣ Check to see if OpenClaw has correctly registered our local model
openclaw models list --provider local
And we can use a simple index call
openclaw infer model run
--model local/qwen3-9b
--prompt "Reply with exactly: pong"
--json
You should get a JSON object in return. Important: make sure you don't have any leaked tags in the response. It shouldn't, but this is doubly important for security
{
"ok": true,
"capability": "model.run",
"transport": "local",
"provider": "local",
"model": "qwen3-9b",
"attempts": [],
"outputs": [
{
"text": "pong",
"mediaUrl": null
}
]
}
Now we have confirmed that all the pipes are working properly. To be absolutely sure (or if this is your first agent), let's set up a sample of the skill, and make sure that the model sets the reasons correctly and makes the tool call as expected.
5. Validate performance with test skills
Let's create a test capability for 'python-calc', which will allow us to check if our local model can correctly consult and issue tool calls.
1️⃣ Run this to create the skill. This will add this tool to all your openclaw agents.
mkdir -p ~/.openclaw/workspace/skills/python-calc
cat << 'EOF' > ~/.openclaw/workspace/skills/python-calc/SKILL.md
---
name: python-calc
description: A tool that evaluates mathematical expressions by executing a Python one-liner.
version: 1.0.0
---
## Instructions
1. Extract the exact mathematical expression the user wants to calculate.
2. Use your built-in shell tool to run this exact command, replacing `` with the expression: `python3 -c "print()"`
3. Wait for the shell tool to return the stdout output.
4. You MUST generate a final conversational response to the user containing the exact numeric result returned by the script.
EOF
Again, restart the gateway.
2️⃣ Now, we can run the sample agent call to ensure the tool exits correctly:
openclaw agent --local --agent main --verbose on --thinking high --message
"Use the python-calc skill to calculate 8664 multiplied by 222.
Do not use skill_workshop. Tell me the final answer."
And, after a second or so, if everything is working fine, we should do something like this:
The final answer is 1,923,408.
It's delicious!
In fact, we can see speeds of up to 20-70 tokens per second*. Although this is not Claude's speed (130 tps+), this is quite reasonable for an OpenClaw agent with minimal hardware.
Remember, the thinking mode is set to high, so it's okay if it takes a while.
If you are not sure whether openclaw is using your model or not, in another terminal window, fill the llama-server log by running
tail -f /tmp/llama-server.err
*Your actual speeds may vary
Wrapping up
Using a local LLM, especially with customized templates and sizing, can be very frustrating. Setting this up for the first time on a friend's Mac took 2 days back and forth! Thanks to Jacob W. for the inspiration.
That's all! I hope this saves you a lot 💸
If it does, or if I save you a headache, you can also buy me a coffee here.
☕ Hello!
1 Tweet by Boris Cherny, discussing the “ban” of OpenClaw
2 User spends $420 per month on API fees
3 Using multiple providers with OpenClaw



