5 OpenClaw Cost Mistakes
▶ New Video 8 min watch
5 OpenClaw Mistakes Costing You Money Right Now
Cut your bill from $36K/yr to $5–10K — heartbeat fix, model routing, session resets
Watch →
Need help? Remote OpenClaw setup, troubleshooting, and training - $100/hour Book a Call →
View on Amazon →
← Back to Blog

Best Local LLM by RAM (April 2026): 8GB to 128GB Hardware Picks

Your RAM is the single biggest constraint on which local LLM you can run. The April 2026 landscape moved fast: Qwen 3.6 27B (released April 22) now outperforms 397B-parameter MoE models on agentic coding benchmarks, gpt-oss has the cleanest tool-call output for OpenClaw, and Llama 3.3 70B is no longer a headline pick. This hub maps every common RAM tier (8GB through 128GB) to the best model that actually fits today.

Need help picking the right model for your hardware?

Book a Call at calendly.com/cloudyeti/meet. We'll match your RAM to the right model and quant in 30 minutes.

Pick Your RAM Tier (April 2026)

Your RAMBest PickBest For OpenClawDetailed Guide
8 GBQwen 3.5 4B (Q5_K_M)Not recommended — use cloud8GB guide →
16 GBQwen 3.5 9B (Q5_K_M)gpt-oss 20B (Q4)16GB guide →
24 GBQwen 3.6 27B (Q4_K_M) ← NEWgpt-oss 20B (Q5)24GB guide →
32 GBQwen 3.6 27B (Q6_K)Qwen 3.6 27B / gpt-oss 20B (Q8)32GB guide →
48 GBQwen 3.6 35B-A3B (Q5)Qwen 3.6 27B (Q8)48GB guide →
64 GBgpt-oss 120B (Q4_K_M)gpt-oss 120B / Mistral Small 4 (119B-A6B)64GB guide →
96 GBQwen 3.5 122B-A10B (Q4_K_M)gpt-oss 120B (Q5)96GB guide →
128 GBgpt-oss 120B (Q6_K)gpt-oss 120B (Q8)128GB guide →

What Changed in April 2026

The local LLM landscape shifted hard between February and April 2026:

  • Qwen 3.6 27B (April 22) — Dense 27B that outperforms the 397B Qwen 3.5 MoE on agentic coding (77.2 vs 76.x on SWE-Bench Verified). The new default for 24-48GB tiers.
  • DeepSeek V4 / V4 Pro (April 24) — Cloud-class, not realistic for local hosts at any consumer RAM tier.
  • GLM-5.1 (April 7) — 744B MoE from Z.ai. Cloud-only. (Earlier guides citing “GLM-5.1 32B” were referring to the older GLM-4 line, not 5.1.)
  • Mistral Small 4 (March 16) — 119B-A6B MoE that fits at Q4 in about 60GB. Replaces Mistral Large 123B.
  • Qwen 3.5 small series (March 2) — 0.8B / 2B / 4B / 9B variants. The 9B is the new 16GB tier pick.
  • Qwen 3.5 medium (February 24) — 27B dense, 35B-A3B MoE, 122B-A10B MoE. The 35B-A3B MoE is excellent at 48GB.
  • Llama 3.3 70B — Still works, no longer the default. The Qwen and gpt-oss families have caught up at smaller sizes.

How to Use This Guide

Step 1: Find your usable RAM, not your installed RAM. On Mac, the OS reserves 4-6GB. On Windows or Linux with an NVIDIA GPU, the relevant number is VRAM (the GPU’s onboard memory), not system RAM.

Step 2: Subtract context overhead. A 32K context window costs roughly 4-6GB. A 128K window costs 16-24GB. Model weights are not the only thing that has to fit.

Step 3: Pick the highest-quality quant that leaves headroom. Q5_K_M is the sweet spot. Q4_K_M is the standard. Below Q3 starts to hurt tool calling, which kills agent runs.

OpenClaw Tool-Calling Reality Check (April 2026)

Most local LLM guides talk about benchmark scores. For OpenClaw, only one metric matters: does the model emit valid JSON when asked to call a tool, hundreds of times in a row, without drift?

Models that pass this filter today:

  • gpt-oss 20B — cleanest tool-call JSON in production, this is the safe default
  • gpt-oss 120B — same family, scaled up
  • Qwen 3.6 27B — fixed the tool-calling regressions from 3.5
  • Qwen 3.6 35B-A3B (MoE) — fast inference with reliable tools
  • Llama 3.3 70B — still fine for tool calls
  • Mistral Small 4 (119B-A6B) — works, but heavier than gpt-oss

Models to avoid for OpenClaw right now:

  • Qwen 3.5 27B — known broken tool-calling in Ollama (GitHub issue #14493)
  • Anything under 7B — too unreliable for autonomous loops
  • Most fine-tunes of base models

Quantization Cheat Sheet

QuantBits/weightQualityWhen to use
Q8_08Near-FP16When you have 2x the model size in RAM
Q5_K_M~5.5Indistinguishable from Q8Best quality-to-size ratio
Q4_K_M~4.5Loses 1-3% on benchmarksStandard pick when RAM is tight
IQ3_XS~3.3Noticeable degradation, MoE-friendlySqueeze a bigger model into too-little RAM
Q2_K~2.6Significantly degradedLast resort, breaks tool calling

See Also

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call

Read next

Best Local LLMs for 128GB RAM (April 2026): gpt-oss 120B Q6 & Mistral Small 4 Q6
Best local LLMs for 128GB RAM in April 2026. gpt-oss 120B at Q6_K, Mistral Small 4 (119B-A6B) at Q6, Qwen 3.5 122B-A10B at Q5, and quad-model setups. Mac Studio Ultra territory.
Best Local LLMs for 16GB RAM (April 2026): Qwen 3.5 9B & gpt-oss 20B
Best local LLMs that run well on 16GB RAM in April 2026. Verified picks: Qwen 3.5 9B (Q8), gpt-oss 20B (Q4), Qwen 3.6 27B (squeeze IQ3), with quantization, speed, and OpenClaw setup.
Best Local LLMs for 24GB RAM (April 2026): Qwen 3.6 27B Headlines
Best local LLMs for 24GB RAM in April 2026. Qwen 3.6 27B (released Apr 22) is the new headline pick — outperforms 397B MoE models on agentic coding. Plus gpt-oss 20B, Qwen 3.5 9B at Q8.