Best Local Models for OpenClaw with Ollama (2026)
Ollama became an official OpenClaw provider in March 2026. That means you can run OpenClaw entirely on your own hardware with no API key and no per-token cost. This guide compares the best local models, lists the hardware you need, and walks through setup.
Why Local Models Matter for OpenClaw
Cloud APIs cost money. Local models through Ollama are free. The most important requirement is context length — at least 64K tokens for reliable tool use.
Model Comparison Table
| Model | Size | Context | Tool Reliability | Speed | Best For |
|---|---|---|---|---|---|
| Qwen3.5 27B | 27B | 128K | Excellent | Fast | Best all-around pick |
| Llama 3.3 70B | 70B | 128K | Excellent | Moderate | Maximum quality |
| Mistral Large | 123B | 128K | Excellent | Slow | Complex reasoning |
| DeepSeek V3 | 671B MoE | 128K | Excellent | Slow | Top-tier quality |
| Qwen2.5 Coder 32B | 32B | 128K | Good | Fast | Code-heavy workflows |
| Llama 3.1 8B | 8B | 128K | Fair | Very Fast | Simple tasks, low-RAM |
| Phi-4 14B | 14B | 64K | Good | Fast | Budget midrange |
| Command R+ 104B | 104B | 128K | Good | Slow | RAG tasks |
Qwen3.5 27B is our top recommendation for most users. For long-horizon autonomous runs, see GLM-5.1 below.
GLM-5.1: The Current #1 Open-Source Model for 8-Hour Autonomous Runs
As of April 2026, Zhipu AI’s GLM-5.1 holds the #1 spot on SWE-Bench Pro among open-source models. The release got immediate signal boost from Ollama’s official account (1,673 likes) and Hugging Face’s @victormustar (1,300 likes), which tells you something: the infrastructure community, not just the model leaderboards, is paying attention.
Key specs. GLM-5.1 ships in two public sizes: a 32B dense variant for single-GPU deployment and a 355B Mixture-of-Experts variant that activates roughly 32B parameters per token. Context window is 128K natively with a 1M-token extended mode. Released by Zhipu AI (z.ai), a Beijing-based lab that has been shipping competitive open weights since the GLM-4 line. License permits commercial use with standard redistribution terms.
Where it shines. GLM-5.1 was explicitly tuned for multi-turn autonomous runs that exceed four hours. Anecdotal reports on X describe clean 8-hour agent loops with no drift on tool schemas, correct JSON argument shaping through hundreds of calls, and stable context management when paired with OpenClaw’s /compact workflow. Tool-calling accuracy on the BFCL benchmark is within 2 points of Claude 3.5 Sonnet. This is the model you pick when you are leaving an OpenClaw agent running overnight.
Hardware. The 32B dense version needs roughly 24 GB VRAM for Q4 quantization (fits on an RTX 4090, RTX A6000, or M3 Max 48GB). CPU fallback works on a machine with 48 GB unified RAM or more, though expect 2-4 tokens per second rather than the 40+ you will see on GPU. The 355B MoE variant is server-class only.
Install and configure for OpenClaw:
# Pull the 32B dense variant ollama pull glm5.1:32b # Set as OpenClaw's default chat model openclaw config set agents.defaults.models.chat ollama/glm5.1:32b # Verify openclaw models status # Smoke test with a tool call openclaw chat "List the three largest files in my home directory"
One caveat. GLM-5.1 is slower to first token than Qwen3.5 27B on short interactive chats, and its English prose is slightly stiffer. If your workload is mostly quick Q&A rather than long agent runs, you are better off with Qwen. GLM-5.1 is the right pick specifically for autonomy, not conversation.
Setting Up Any Model
# 1. Pull the model ollama pull qwen3.5:27b # 2. Set it as your default chat model openclaw config set agents.defaults.models.chat ollama/qwen3.5:27b # 3. Verify openclaw models list # 4. Test openclaw chat "List the files in my home directory"
Minimum Specs by Model Size
| Model Size | Min RAM (CPU) | Min VRAM (GPU) | Example Hardware |
|---|---|---|---|
| 7-8B | 16 GB | 8 GB | M1/M2 MacBook, RTX 3070 |
| 14B | 24 GB | 12 GB | M2 Pro Mac, RTX 4070 |
| 27-32B | 32 GB | 24 GB | M3 Pro/Max Mac, RTX 4090 |
| 70B | 64 GB | 48 GB | M3 Ultra Mac, RTX A6000 |
| 100B+ | 128 GB | 80 GB+ | Mac Studio Ultra, A100/H100 |
For a dedicated OpenClaw host, the Apple Mac mini M4 (16GB) handles models up to 14B comfortably.
Avoid Models Under 7B
None passed OpenClaw’s tool-calling validation consistently. Use a free-tier cloud provider instead.
For more, see our full OpenClaw troubleshooting guide.
Get guides like this in your inbox every Wednesday.
No spam. Unsubscribe anytime.
You'll probably need this again.
Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.
Need help with your OpenClaw setup?
We do remote setup, troubleshooting, and training worldwide.
Book a Call