5 OpenClaw Cost Mistakes
▶ New Video 8 min watch
5 OpenClaw Mistakes Costing You Money Right Now
Cut your bill from $36K/yr to $5–10K — heartbeat fix, model routing, session resets
Watch →
Need help? Remote OpenClaw setup, troubleshooting, and training - $100/hour Book a Call →
View on Amazon →
💻 Running OpenClaw locally? MINIMUM MacBook Pro M-series (24 GB) ↗ RECOMMENDED Premium Mac for 48 GB+ ↗
← Back to Blog

Best Local LLM by GPU (2026): RTX 3090, 4090, 5090, A6000, M-series Picks

Your GPU (or unified-memory chip) is the biggest determinant of which local LLM runs well. This hub maps every popular consumer + workstation + Apple Silicon option to the best model that actually fits, with quants, tokens/sec, and the exact OpenClaw config. Click through to the dedicated GPU page for detailed picks.

Need help picking the right GPU for your model?

Book a Call at calendly.com/cloudyeti/meet. We'll match your workload to the cheapest GPU that runs it.

Pick Your GPU (2026)

Consumer NVIDIA

Your GPUVRAMBest PickSpeedDetailed Guide
RTX 309024 GBQwen 3.6 27B (Q4_K_M)~35 tok/s3090 guide →
RTX 409024 GBQwen 3.6 27B (Q4_K_M)~50 tok/s4090 guide →
RTX 509032 GBQwen 3.6 35B-A3B (Q6) ← NEW~80 tok/s5090 guide →
RTX 4070 Ti SUPER16 GBQwen 3.5 9B (Q8)~45 tok/s4070 Ti SUPER guide →
RTX 4060 Ti 16GB16 GBgpt-oss 20B (Q4)~22 tok/s4060 Ti 16GB guide →

Workstation NVIDIA

Your GPUVRAMBest PickSpeedDetailed Guide
RTX A600048 GBGLM-5.1 32B or Qwen 3.6 27B (Q8)~28 tok/sA6000 guide →

Apple Silicon

Your MacUnified RAMBest PickSpeedDetailed Guide
MacBook Pro M4 Max36-128 GBQwen 3.6 27B (Q6 or Q8)~25 tok/sM4 Max guide →
Mac Studio M2 Ultra64-192 GBgpt-oss 120B or Mistral Small 4 (119B-A6B)~25 tok/sM2 Ultra guide →

How to Read the Speed Numbers

The tok/sec figures above are realistic ranges on the recommended model — not theoretical max. Real-world drift depends on:

  • Quantization — Q4 runs ~30% faster than Q8 on the same model
  • Context length — KV cache eats VRAM and slows inference as it fills
  • Batch size — single-user inference is bandwidth-bound; batched serving is compute-bound

For OpenClaw specifically, tool-call accuracy matters more than tokens/sec. A 22 tok/s response that nails the JSON is better than 60 tok/s that drifts.

VRAM Tier vs Model Pick

The pattern is consistent across GPUs:

Available VRAMBest PickFor OpenClaw
8-12 GBQwen 3.5 9B (Q4 or Q5)Not recommended — use cloud
16 GBQwen 3.5 9B (Q8) or gpt-oss 20B (Q4)gpt-oss 20B (Q4)
24 GBQwen 3.6 27B (Q4_K_M)gpt-oss 20B (Q5)
32 GBQwen 3.6 27B (Q6) or 35B-A3B (Q5)gpt-oss 20B (Q8)
48 GBGLM-5.1 32B (Q5) or Llama 3.3 70B (Q3)Dual: gpt-oss 20B + Qwen 3.6 27B

OpenClaw Tool-Calling Reality Check

Most GPU guides talk about benchmark scores or raw tokens/sec. For OpenClaw, only one thing matters: does the model emit clean JSON for tool calls, hundreds of times in a row, without drift?

Models that pass this filter regardless of GPU:

  • gpt-oss 20B — cleanest tool-call JSON; safe production default
  • gpt-oss 120B — same, scaled up (needs 64+ GB VRAM)
  • Qwen 3.6 27B — fixed the Qwen 3.5 tool-calling regressions
  • Qwen 3.6 35B-A3B (MoE) — fast inference, reliable tools

Models to avoid for OpenClaw right now (regardless of how fast your GPU runs them):

  • Qwen 3.5 27B — known broken tool-calling in Ollama (GitHub issue #14493)
  • Anything under 7B at any quant — drifts under load

See Also

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call

Read next

Best Local LLM for Mac Studio M2 Ultra (2026): 64/128/192 GB Unified
Best local LLM for the Mac Studio M2 Ultra. April 2026 picks for 64GB, 128GB, 192GB variants. gpt-oss 120B, Mistral Small 4 (119B-A6B), Llama 3.3 70B Q8, and quad-model OpenClaw setups.
Best Local LLM for MacBook Pro M4 Max (2026): 36/48/64/96/128 GB Picks
Best local LLM for the Apple MacBook Pro M4 Max. April 2026 picks for the 36GB, 48GB, 64GB, 96GB, 128GB variants. Qwen 3.6 27B at Q8, Llama 3.3 70B at Q5, GLM-5.1 32B + OpenClaw.
Best Local LLM for RTX 3090 (2026): 24GB VRAM Picks + OpenClaw Setup
The best local LLM for the RTX 3090 24GB. April 2026 picks: Qwen 3.6 27B (Q4_K_M), gpt-oss 20B (Q5), Qwen 3.6 35B-A3B (MoE), with quants, tokens/sec, and OpenClaw setup. The 3090 is still the LLM value GPU.