5 OpenClaw Cost Mistakes
▶ New Video 8 min watch
5 OpenClaw Mistakes Costing You Money Right Now
Cut your bill from $36K/yr to $5–10K — heartbeat fix, model routing, session resets
Watch →
Need help? Remote OpenClaw setup, troubleshooting, and training - $100/hour Book a Call →
View on Amazon →
💻 Running OpenClaw locally? MINIMUM MacBook Pro M-series (24 GB) ↗ RECOMMENDED Premium Mac for 48 GB+ ↗
← Back to Blog

Best Local LLM for RTX 3090 (2026): 24GB VRAM Picks + OpenClaw Setup

The RTX 3090 is still the best value GPU for local LLMs in 2026. 24 GB VRAM at 936 GB/s memory bandwidth runs Qwen 3.6 27B at Q4 comfortably with ~35 tokens/sec. Used 3090s on eBay sell for $600-800 — about half what a 4090 costs, with 90% of the LLM throughput on 24GB workloads.

RTX 3090 sitting idle? Turn it into an OpenClaw host.

Book a Call at calendly.com/cloudyeti/meet. We'll get OpenClaw routing all your AI to local Ollama on the 3090, free.

Bottom Line

  • Best overall pick: Qwen 3.6 27B at Q4_K_M (~35 tok/sec)
  • Best for OpenClaw production: gpt-oss 20B at Q5_K_M (cleanest tool calls)
  • Best fast pick: Qwen 3.6 35B-A3B at IQ4_XS (MoE — ~50 tok/sec, 3B active params)
  • Skip: Llama 70B at any quant on a single 3090

Top Picks for RTX 3090 (24 GB VRAM)

1. Qwen 3.6 27B (Q4_K_M) — best overall

The April 22, 2026 release fits perfectly on the 3090. About 17 GB VRAM at Q4_K_M with 32K context. Outperforms the 397B Qwen 3.5 MoE on agentic coding benchmarks (77.2 SWE-Bench Verified).

ollama pull qwen3.6:27b

openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b
openclaw chat "Refactor this function and update the callers"

Expected speed on RTX 3090: 30-40 tokens/sec.

2. gpt-oss 20B (Q5_K_M) — best for OpenClaw production

OpenAI’s 20B at Q5 uses about 15 GB. Cleanest tool-call JSON of any open model — exactly what OpenClaw autonomous loops need.

ollama pull gpt-oss:20b-q5_K_M
openclaw config set agents.defaults.models.chat ollama/gpt-oss:20b-q5_K_M
openclaw run --agent "Implement the spec end-to-end"

3. Qwen 3.6 35B-A3B (IQ4_XS) — fastest

Mixture-of-Experts variant of Qwen 3.6. 35B total params, 3B active per token. At IQ4_XS uses about 19 GB. Inference is 8B-class speed (~50 tok/sec on RTX 3090).

ollama pull qwen3.6:35b-iq4_xs

4. Nemotron Cascade 2 30B (Q4_K_M) — NVIDIA’s late-March 2026 release

30B dense, 256K context, strong on structured output. About 18 GB at Q4_K_M.

5. Mistral Small 3 22B (Q5_K_M) — alternative

About 16 GB at Q5. Good for European-language workloads, slightly weaker on code than Qwen 3.6.

What Fits in 24 GB VRAM

ModelQuantVRAMTok/sec
Qwen 3.6 27BQ4_K_M~17 GB30-40
Qwen 3.6 35B-A3B (MoE)IQ4_XS~19 GB45-55
gpt-oss 20BQ5_K_M~15 GB40-50
Nemotron Cascade 2 30BQ4_K_M~18 GB28-35
Qwen 3.5 9BQ8_0~10 GB60-80
Llama 3.3 70BIQ2_XS~19 GB8-12 (degraded)

OpenClaw Setup on RTX 3090

# 1. Pull Qwen 3.6 27B
ollama pull qwen3.6:27b

# 2. Wire it in
openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b

# 3. Use 32K context (24GB has the headroom)
openclaw config set agents.defaults.context_limit 32768

# 4. For autonomous runs, prefer gpt-oss 20B (more reliable)
openclaw config set agents.defaults.models.agent ollama/gpt-oss:20b-q5_K_M

# 5. Smoke test
openclaw chat "List the three largest files in my home directory"

Common Mistakes on RTX 3090

  1. Trying to run Llama 3.3 70B at IQ2. It technically fits but quality collapses. Qwen 3.6 27B at Q4 beats it on every benchmark.
  2. Maxing context to 128K. KV cache eats VRAM fast — at 128K with a 27B Q4 model, you’ll OOM before you fill the context. Cap at 32K, raise selectively.
  3. Picking Qwen 3.5 27B for OpenClaw. Tool-calling bug in Ollama (GitHub issue #14493). Always use Qwen 3.6 27B.
  4. Ignoring power supply headroom. RTX 3090 pulls 350W under sustained inference. Make sure your PSU has 100W+ headroom or it’ll throttle / shut down on long runs.

🛒 Mac alternative for the same workload

Don't want to build a GPU rig? Apple Silicon delivers equivalent local-AI capability with unified memory and zero ops overhead.

Amazon affiliate links — we earn a small commission at no cost to you.

See Also

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call

Read next

Best Local LLM by GPU (2026): RTX 3090, 4090, 5090, A6000, M-series Picks
Pick the best local LLM for your exact GPU. April 2026 picks for RTX 3090, 4090, 5090, RTX 4070 Ti SUPER, RTX 4060 Ti 16GB, RTX A6000, Apple M4 Max, and Mac Studio M2 Ultra. With quantization, speed, and OpenClaw setup.
Best Local LLM for Mac Studio M2 Ultra (2026): 64/128/192 GB Unified
Best local LLM for the Mac Studio M2 Ultra. April 2026 picks for 64GB, 128GB, 192GB variants. gpt-oss 120B, Mistral Small 4 (119B-A6B), Llama 3.3 70B Q8, and quad-model OpenClaw setups.
Best Local LLM for MacBook Pro M4 Max (2026): 36/48/64/96/128 GB Picks
Best local LLM for the Apple MacBook Pro M4 Max. April 2026 picks for the 36GB, 48GB, 64GB, 96GB, 128GB variants. Qwen 3.6 27B at Q8, Llama 3.3 70B at Q5, GLM-5.1 32B + OpenClaw.