Best Local LLM by RAM (April 2026): 8GB to 128GB Hardware Picks
Your RAM is the single biggest constraint on which local LLM you can run. The April 2026 landscape moved fast: Qwen 3.6 27B (released April 22) now outperforms 397B-parameter MoE models on agentic coding benchmarks, gpt-oss has the cleanest tool-call output for OpenClaw, and Llama 3.3 70B is no longer a headline pick. This hub maps every common RAM tier (8GB through 128GB) to the best model that actually fits today.
Need help picking the right model for your hardware?
Book a Call at calendly.com/cloudyeti/meet. We'll match your RAM to the right model and quant in 30 minutes.
Pick Your RAM Tier (April 2026)
| Your RAM | Best Pick | Best For OpenClaw | Detailed Guide |
|---|---|---|---|
| 8 GB | Qwen 3.5 4B (Q5_K_M) | Not recommended — use cloud | 8GB guide → |
| 16 GB | Qwen 3.5 9B (Q5_K_M) | gpt-oss 20B (Q4) | 16GB guide → |
| 24 GB | Qwen 3.6 27B (Q4_K_M) ← NEW | gpt-oss 20B (Q5) | 24GB guide → |
| 32 GB | Qwen 3.6 27B (Q6_K) | Qwen 3.6 27B / gpt-oss 20B (Q8) | 32GB guide → |
| 48 GB | Qwen 3.6 35B-A3B (Q5) | Qwen 3.6 27B (Q8) | 48GB guide → |
| 64 GB | gpt-oss 120B (Q4_K_M) | gpt-oss 120B / Mistral Small 4 (119B-A6B) | 64GB guide → |
| 96 GB | Qwen 3.5 122B-A10B (Q4_K_M) | gpt-oss 120B (Q5) | 96GB guide → |
| 128 GB | gpt-oss 120B (Q6_K) | gpt-oss 120B (Q8) | 128GB guide → |
What Changed in April 2026
The local LLM landscape shifted hard between February and April 2026:
- Qwen 3.6 27B (April 22) — Dense 27B that outperforms the 397B Qwen 3.5 MoE on agentic coding (77.2 vs 76.x on SWE-Bench Verified). The new default for 24-48GB tiers.
- DeepSeek V4 / V4 Pro (April 24) — Cloud-class, not realistic for local hosts at any consumer RAM tier.
- GLM-5.1 (April 7) — 744B MoE from Z.ai. Cloud-only. (Earlier guides citing “GLM-5.1 32B” were referring to the older GLM-4 line, not 5.1.)
- Mistral Small 4 (March 16) — 119B-A6B MoE that fits at Q4 in about 60GB. Replaces Mistral Large 123B.
- Qwen 3.5 small series (March 2) — 0.8B / 2B / 4B / 9B variants. The 9B is the new 16GB tier pick.
- Qwen 3.5 medium (February 24) — 27B dense, 35B-A3B MoE, 122B-A10B MoE. The 35B-A3B MoE is excellent at 48GB.
- Llama 3.3 70B — Still works, no longer the default. The Qwen and gpt-oss families have caught up at smaller sizes.
How to Use This Guide
Step 1: Find your usable RAM, not your installed RAM. On Mac, the OS reserves 4-6GB. On Windows or Linux with an NVIDIA GPU, the relevant number is VRAM (the GPU’s onboard memory), not system RAM.
Step 2: Subtract context overhead. A 32K context window costs roughly 4-6GB. A 128K window costs 16-24GB. Model weights are not the only thing that has to fit.
Step 3: Pick the highest-quality quant that leaves headroom. Q5_K_M is the sweet spot. Q4_K_M is the standard. Below Q3 starts to hurt tool calling, which kills agent runs.
OpenClaw Tool-Calling Reality Check (April 2026)
Most local LLM guides talk about benchmark scores. For OpenClaw, only one metric matters: does the model emit valid JSON when asked to call a tool, hundreds of times in a row, without drift?
Models that pass this filter today:
- gpt-oss 20B — cleanest tool-call JSON in production, this is the safe default
- gpt-oss 120B — same family, scaled up
- Qwen 3.6 27B — fixed the tool-calling regressions from 3.5
- Qwen 3.6 35B-A3B (MoE) — fast inference with reliable tools
- Llama 3.3 70B — still fine for tool calls
- Mistral Small 4 (119B-A6B) — works, but heavier than gpt-oss
Models to avoid for OpenClaw right now:
- Qwen 3.5 27B — known broken tool-calling in Ollama (GitHub issue #14493)
- Anything under 7B — too unreliable for autonomous loops
- Most fine-tunes of base models
Quantization Cheat Sheet
| Quant | Bits/weight | Quality | When to use |
|---|---|---|---|
| Q8_0 | 8 | Near-FP16 | When you have 2x the model size in RAM |
| Q5_K_M | ~5.5 | Indistinguishable from Q8 | Best quality-to-size ratio |
| Q4_K_M | ~4.5 | Loses 1-3% on benchmarks | Standard pick when RAM is tight |
| IQ3_XS | ~3.3 | Noticeable degradation, MoE-friendly | Squeeze a bigger model into too-little RAM |
| Q2_K | ~2.6 | Significantly degraded | Last resort, breaks tool calling |
See Also
- Best Local Models for OpenClaw — model-first comparison
- OpenClaw Mac Mini Setup — turn a Mac mini into an always-on host
- OpenClaw Costs Guide — when local pays back the hardware
- OpenClaw Troubleshooting — fixes for common local-model issues
Get guides like this in your inbox every Wednesday.
No spam. Unsubscribe anytime.
You'll probably need this again.
Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.
Need help with your OpenClaw setup?
We do remote setup, troubleshooting, and training worldwide.
Book a Call