5 OpenClaw Cost Mistakes
▶ New Video 8 min watch
5 OpenClaw Mistakes Costing You Money Right Now
Cut your bill from $36K/yr to $5–10K — heartbeat fix, model routing, session resets
Watch →
Need help? Remote OpenClaw setup, troubleshooting, and training - $100/hour Book a Call →
View on Amazon →
← Back to Blog

Best Local LLMs for 24GB RAM (April 2026): Qwen 3.6 27B Headlines

24GB is the most popular tier for serious local LLM work in April 2026, and the brand-new Qwen 3.6 27B (released April 22, 2026) just made it the sweet spot. Qwen 3.6 27B at Q4_K_M uses 16.8GB and runs at about 25.6 tokens per second on Apple M-series — and it outperforms the 397B Qwen 3.5 MoE on agentic coding benchmarks. This is the new headline pick for Mac Mini 24GB and RTX 3090/4090 owners.

Mac Mini 24GB owner running OpenClaw?

Book a Call at calendly.com/cloudyeti/meet. We'll get Qwen 3.6 27B humming on your unified memory.

Bottom Line (April 2026)

  • Best overall pick: Qwen 3.6 27B at Q4_K_M — released April 22, 2026
  • Best for OpenClaw production: gpt-oss 20B at Q5_K_M (cleanest tool calls)
  • Best for fast inference: Qwen 3.6 35B-A3B (MoE — 3B active params per token)
  • Best premium quality: Qwen 3.5 9B at Q8_0 with 128K context

Top Picks for 24GB RAM

1. Qwen 3.6 27B (Q4_K_M) — the new headline (April 22, 2026)

The most important local LLM release of April 2026. Dense 27B model that scores 77.2 on SWE-Bench Verified, outperforming the 397B Qwen 3.5 MoE on agentic coding. About 16.8GB on disk at Q4_K_M, runs at 25.6 tokens per second on Apple M-series.

ollama pull qwen3.6:27b

openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b
openclaw chat "Refactor this function and update the callers"

This is the model that made Llama 3.3 70B feel old.

2. gpt-oss 20B (Q5_K_M) — best for OpenClaw production

OpenAI’s open-weight 20B at Q5_K_M uses about 14GB. Cleanest tool-call JSON output of any open-weight model — which is exactly what OpenClaw autonomous loops need. Pick this over Qwen 3.6 27B if your workload is heavily tool-call dependent.

ollama pull gpt-oss:20b-q5_K_M

openclaw config set agents.defaults.models.chat ollama/gpt-oss:20b-q5_K_M
openclaw run --agent "Implement the spec in features.md"

3. Qwen 3.6 35B-A3B (MoE) — fastest at this tier

The Qwen 3.6 Mixture-of-Experts variant. 35B total parameters but only 3B active per token, which means inference is roughly 8B-class speed (40-60 tok/sec on Apple Silicon). At IQ4_XS the model fits in about 18GB.

ollama pull qwen3.6:35b
openclaw config set agents.defaults.models.chat ollama/qwen3.6:35b

Pick this if speed matters more than peak quality. The MoE design and Mac Mini’s unified memory are a perfect match.

4. Qwen 3.5 9B (Q8_0) — premium small model

If you want the highest-quality small model rather than a midsize one at Q4, Qwen 3.5 9B at Q8 uses about 11GB. Leaves you 12GB for context (128K is realistic) and other apps.

ollama pull qwen3.5:9b-q8_0

5. Nemotron Cascade 2 30B — NVIDIA’s recent drop

NVIDIA’s late-March 2026 release. 30B dense, strong on reasoning and structured output. About 19GB at Q4_K_M.

ollama pull nemotron-cascade-2:30b

What Fits in 24GB

ModelQuantRAM UsedTool Calling
Qwen 3.6 27BQ4_K_M~19 GBExcellent
gpt-oss 20BQ5_K_M~17 GBExcellent (production)
Qwen 3.6 35B-A3BIQ4_XS~21 GBExcellent
Nemotron Cascade 2 30BQ4_K_M~19 GBGood
Qwen 3.5 9BQ8_0~11 GBGood
Qwen 3.5 4BQ8_0~5 GBFair

OpenClaw Setup on 24GB Mac Mini

The Mac Mini 24GB is one of the best dedicated OpenClaw hosts you can buy. With the Qwen 3.6 release, the recipe is:

# 1. Pull Qwen 3.6 27B (the new headline model)
ollama pull qwen3.6:27b

# 2. Wire it into OpenClaw
openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b

# 3. Set context to 32K (leaves headroom)
openclaw config set agents.defaults.context_limit 32000

# 4. For autonomous runs, prefer gpt-oss 20B (more reliable tool calls)
openclaw config set agents.defaults.models.agent ollama/gpt-oss:20b-q5_K_M

# 5. Smoke test
openclaw chat "List the three largest files in my home directory"

Common Mistakes at 24GB

  1. Picking Qwen 3.5 27B instead of 3.6 27B. The 3.5 has a tool-calling bug in Ollama (GitHub issue #14493) that breaks OpenClaw. Always 3.6.
  2. Defaulting to Llama 3.3 70B at IQ2. It used to be the headline pick at this tier. Qwen 3.6 27B at Q4 now beats it on every metric and fits comfortably.
  3. Forgetting to leave OS headroom on Mac Mini. macOS uses 4-6GB. Treat 24GB unified as 18-20GB available.
  4. Using the full 256K Qwen 3.6 context window. The KV cache alone eats 24GB+. Cap at 32K-64K and raise only if needed.

Hardware That Actually Hits 24GB

  • Apple Mac mini M4 (24GB) — best dedicated OpenClaw host
  • M2/M3/M4 Pro MacBook Pro (24GB)
  • NVIDIA RTX 3090 24GB / RTX 4090 24GB — fastest discrete option
  • NVIDIA RTX A5000 24GB — workstation card

See Also

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call

Read next

Best Local LLM by RAM (April 2026): 8GB to 128GB Hardware Picks
Pick the best local LLM for your exact RAM. April 2026 picks featuring Qwen 3.6 27B, gpt-oss 20B/120B, Mistral Small 4, and Nemotron Cascade 2 with quantization, speed, and OpenClaw setup.
Best Local LLMs for 128GB RAM (April 2026): gpt-oss 120B Q6 & Mistral Small 4 Q6
Best local LLMs for 128GB RAM in April 2026. gpt-oss 120B at Q6_K, Mistral Small 4 (119B-A6B) at Q6, Qwen 3.5 122B-A10B at Q5, and quad-model setups. Mac Studio Ultra territory.
Best Local LLMs for 16GB RAM (April 2026): Qwen 3.5 9B & gpt-oss 20B
Best local LLMs that run well on 16GB RAM in April 2026. Verified picks: Qwen 3.5 9B (Q8), gpt-oss 20B (Q4), Qwen 3.6 27B (squeeze IQ3), with quantization, speed, and OpenClaw setup.