5 OpenClaw Cost Mistakes
▶ New Video 8 min watch
5 OpenClaw Mistakes Costing You Money Right Now
Cut your bill from $36K/yr to $5–10K — heartbeat fix, model routing, session resets
Watch →
Need help? Remote OpenClaw setup, troubleshooting, and training - $100/hour Book a Call →
View on Amazon →
💻 Running OpenClaw locally? MINIMUM MacBook Pro M-series (24 GB) ↗ RECOMMENDED Premium Mac for 48 GB+ ↗
← Back to Blog

Best Local LLM for RTX 4090 (2026): 24GB VRAM Picks + OpenClaw Setup

The RTX 4090 is the bandwidth king for 24 GB workloads. 1008 GB/s memory bandwidth runs Qwen 3.6 27B at Q4 at ~50 tokens/sec — 40% faster than an RTX 3090 on the same model. If you bought a 4090 for gaming, OpenClaw + Ollama turn it into a serious local AI rig.

RTX 4090 idle when you're not gaming?

Book a Call at calendly.com/cloudyeti/meet. We'll wire OpenClaw to run all your AI on the 4090 locally, free.

Bottom Line

  • Best overall pick: Qwen 3.6 27B at Q4_K_M (~50 tok/sec, sweet spot)
  • Best for OpenClaw production: gpt-oss 20B at Q5_K_M
  • Best fast pick: Qwen 3.6 35B-A3B (MoE) at IQ4_XS (~70 tok/sec)
  • vs RTX 3090: ~40% faster on identical workloads, same 24 GB ceiling

Top Picks for RTX 4090 (24 GB VRAM, 1008 GB/s bandwidth)

1. Qwen 3.6 27B (Q4_K_M) — best overall

Released April 22, 2026. Outperforms the 397B Qwen 3.5 MoE on agentic coding (77.2 SWE-Bench Verified). About 17 GB VRAM at Q4_K_M with 32K context.

ollama pull qwen3.6:27b
openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b

Expected speed on RTX 4090: 45-55 tokens/sec.

2. gpt-oss 20B (Q5_K_M) — best for OpenClaw production

OpenAI’s 20B at Q5 uses about 15 GB. Cleanest tool-call JSON of any open-weight model.

ollama pull gpt-oss:20b-q5_K_M
openclaw config set agents.defaults.models.chat ollama/gpt-oss:20b-q5_K_M

3. Qwen 3.6 35B-A3B (Q5_K_M) — fastest

MoE variant — 3B active params per token. At Q5 uses about 22 GB. Inference is blistering on the 4090: 65-75 tok/sec.

4. Qwen 3.6 27B (Q5_K_M) — premium quality squeeze

Q5_K_M of the same 27B model uses ~19 GB. Slight quality bump over Q4, ~30% slower (35-45 tok/sec). Worth it if your workload is reasoning-heavy.

What Fits in 24 GB VRAM (RTX 4090)

ModelQuantVRAMTok/sec
Qwen 3.6 27BQ4_K_M~17 GB45-55
Qwen 3.6 27BQ5_K_M~19 GB35-45
Qwen 3.6 35B-A3B (MoE)Q5_K_M~22 GB65-75
gpt-oss 20BQ5_K_M~15 GB55-70
Qwen 3.5 9BQ8_0~10 GB90-110

OpenClaw Setup on RTX 4090

ollama pull qwen3.6:27b
openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b
openclaw config set agents.defaults.context_limit 65536
openclaw config set agents.defaults.models.agent ollama/gpt-oss:20b-q5_K_M
openclaw chat "Refactor the auth module"

Common Mistakes on RTX 4090

  1. Defaulting to Q8 because you can. Q5_K_M is near-FP16 quality. Q8 just halves your tokens/sec for imperceptible gain on 27B models.
  2. Running Llama 3.3 70B at IQ2. Qwen 3.6 27B at Q5 beats it on benchmarks for half the VRAM. The 70B obsession is mostly outdated for 2026.
  3. Setting context to 128K. KV cache eats 8-12 GB on top of the model. You’ll OOM. Cap at 64K.
  4. Forgetting the 4090 pulls 450W. Use a 1000W+ PSU with at least 100W headroom for sustained inference loads.

🛒 Mac alternative for the same workload

Don't want to build a GPU rig? Apple Silicon delivers equivalent local-AI capability with unified memory and zero ops overhead.

Amazon affiliate links — we earn a small commission at no cost to you.

See Also

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call

Read next

Best Local LLM by GPU (2026): RTX 3090, 4090, 5090, A6000, M-series Picks
Pick the best local LLM for your exact GPU. April 2026 picks for RTX 3090, 4090, 5090, RTX 4070 Ti SUPER, RTX 4060 Ti 16GB, RTX A6000, Apple M4 Max, and Mac Studio M2 Ultra. With quantization, speed, and OpenClaw setup.
Best Local LLM for Mac Studio M2 Ultra (2026): 64/128/192 GB Unified
Best local LLM for the Mac Studio M2 Ultra. April 2026 picks for 64GB, 128GB, 192GB variants. gpt-oss 120B, Mistral Small 4 (119B-A6B), Llama 3.3 70B Q8, and quad-model OpenClaw setups.
Best Local LLM for MacBook Pro M4 Max (2026): 36/48/64/96/128 GB Picks
Best local LLM for the Apple MacBook Pro M4 Max. April 2026 picks for the 36GB, 48GB, 64GB, 96GB, 128GB variants. Qwen 3.6 27B at Q8, Llama 3.3 70B at Q5, GLM-5.1 32B + OpenClaw.