5 OpenClaw Cost Mistakes
▶ New Video 8 min watch
5 OpenClaw Mistakes Costing You Money Right Now
Cut your bill from $36K/yr to $5–10K — heartbeat fix, model routing, session resets
Watch →
Need help? Remote OpenClaw setup, troubleshooting, and training - $100/hour Book a Call →
View on Amazon →
← Back to Blog

Best Local Models for OpenClaw with Ollama (2026)

Ollama became an official OpenClaw provider in March 2026. That means you can run OpenClaw entirely on your own hardware with no API key and no per-token cost. This guide compares the best local models, lists the hardware you need, and walks through setup.

Why Local Models Matter for OpenClaw

Cloud APIs cost money. Local models through Ollama are free. The most important requirement is context length — at least 64K tokens for reliable tool use.

Model Comparison Table

ModelSizeContextTool ReliabilitySpeedBest For
Qwen3.5 27B27B128KExcellentFastBest all-around pick
Llama 3.3 70B70B128KExcellentModerateMaximum quality
Mistral Large123B128KExcellentSlowComplex reasoning
DeepSeek V3671B MoE128KExcellentSlowTop-tier quality
Qwen2.5 Coder 32B32B128KGoodFastCode-heavy workflows
Llama 3.1 8B8B128KFairVery FastSimple tasks, low-RAM
Phi-4 14B14B64KGoodFastBudget midrange
Command R+ 104B104B128KGoodSlowRAG tasks

Qwen3.5 27B is our top recommendation for most users. For long-horizon autonomous runs, see GLM-5.1 below.

GLM-5.1: The Current #1 Open-Source Model for 8-Hour Autonomous Runs

As of April 2026, Zhipu AI’s GLM-5.1 holds the #1 spot on SWE-Bench Pro among open-source models. The release got immediate signal boost from Ollama’s official account (1,673 likes) and Hugging Face’s @victormustar (1,300 likes), which tells you something: the infrastructure community, not just the model leaderboards, is paying attention.

Key specs. GLM-5.1 ships in two public sizes: a 32B dense variant for single-GPU deployment and a 355B Mixture-of-Experts variant that activates roughly 32B parameters per token. Context window is 128K natively with a 1M-token extended mode. Released by Zhipu AI (z.ai), a Beijing-based lab that has been shipping competitive open weights since the GLM-4 line. License permits commercial use with standard redistribution terms.

Where it shines. GLM-5.1 was explicitly tuned for multi-turn autonomous runs that exceed four hours. Anecdotal reports on X describe clean 8-hour agent loops with no drift on tool schemas, correct JSON argument shaping through hundreds of calls, and stable context management when paired with OpenClaw’s /compact workflow. Tool-calling accuracy on the BFCL benchmark is within 2 points of Claude 3.5 Sonnet. This is the model you pick when you are leaving an OpenClaw agent running overnight.

Hardware. The 32B dense version needs roughly 24 GB VRAM for Q4 quantization (fits on an RTX 4090, RTX A6000, or M3 Max 48GB). CPU fallback works on a machine with 48 GB unified RAM or more, though expect 2-4 tokens per second rather than the 40+ you will see on GPU. The 355B MoE variant is server-class only.

Install and configure for OpenClaw:

# Pull the 32B dense variant
ollama pull glm5.1:32b

# Set as OpenClaw's default chat model
openclaw config set agents.defaults.models.chat ollama/glm5.1:32b

# Verify
openclaw models status

# Smoke test with a tool call
openclaw chat "List the three largest files in my home directory"

One caveat. GLM-5.1 is slower to first token than Qwen3.5 27B on short interactive chats, and its English prose is slightly stiffer. If your workload is mostly quick Q&A rather than long agent runs, you are better off with Qwen. GLM-5.1 is the right pick specifically for autonomy, not conversation.

Setting Up Any Model

# 1. Pull the model
ollama pull qwen3.5:27b

# 2. Set it as your default chat model
openclaw config set agents.defaults.models.chat ollama/qwen3.5:27b

# 3. Verify
openclaw models list

# 4. Test
openclaw chat "List the files in my home directory"

Minimum Specs by Model Size

Model SizeMin RAM (CPU)Min VRAM (GPU)Example Hardware
7-8B16 GB8 GBM1/M2 MacBook, RTX 3070
14B24 GB12 GBM2 Pro Mac, RTX 4070
27-32B32 GB24 GBM3 Pro/Max Mac, RTX 4090
70B64 GB48 GBM3 Ultra Mac, RTX A6000
100B+128 GB80 GB+Mac Studio Ultra, A100/H100

For a dedicated OpenClaw host, the Apple Mac mini M4 (16GB) handles models up to 14B comfortably.

Avoid Models Under 7B

None passed OpenClaw’s tool-calling validation consistently. Use a free-tier cloud provider instead.

For more, see our full OpenClaw troubleshooting guide.

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call

Read next

Qwen 3.5 27B on a Single RTX 3090 Beats 120B Models on $70K H200 Rigs (For Agent Coding)
Qwen 3.5 27B dense Q4 on a single RTX 3090 one-shots agent coding tasks that 120B MoE models on $70K H200 rigs fail. Benchmarks, setup, and OpenClaw install steps.
OpenClaw + Qwen: Verified Setup Paths for OAuth and Ollama
How to use Qwen with OpenClaw using the official Qwen OAuth provider or a local Ollama model. Commands, model IDs, and tool-calling caveats included.
Best Local LLM by RAM (April 2026): 8GB to 128GB Hardware Picks
Pick the best local LLM for your exact RAM. April 2026 picks featuring Qwen 3.6 27B, gpt-oss 20B/120B, Mistral Small 4, and Nemotron Cascade 2 with quantization, speed, and OpenClaw setup.