Best Local LLM for RTX 3090 (2026): 24GB VRAM Picks + OpenClaw Setup
The RTX 3090 is still the best value GPU for local LLMs in 2026. 24 GB VRAM at 936 GB/s memory bandwidth runs Qwen 3.6 27B at Q4 comfortably with ~35 tokens/sec. Used 3090s on eBay sell for $600-800 — about half what a 4090 costs, with 90% of the LLM throughput on 24GB workloads.
RTX 3090 sitting idle? Turn it into an OpenClaw host.
Book a Call at calendly.com/cloudyeti/meet. We'll get OpenClaw routing all your AI to local Ollama on the 3090, free.
Bottom Line
- Best overall pick: Qwen 3.6 27B at Q4_K_M (~35 tok/sec)
- Best for OpenClaw production: gpt-oss 20B at Q5_K_M (cleanest tool calls)
- Best fast pick: Qwen 3.6 35B-A3B at IQ4_XS (MoE — ~50 tok/sec, 3B active params)
- Skip: Llama 70B at any quant on a single 3090
Top Picks for RTX 3090 (24 GB VRAM)
1. Qwen 3.6 27B (Q4_K_M) — best overall
The April 22, 2026 release fits perfectly on the 3090. About 17 GB VRAM at Q4_K_M with 32K context. Outperforms the 397B Qwen 3.5 MoE on agentic coding benchmarks (77.2 SWE-Bench Verified).
ollama pull qwen3.6:27b openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b openclaw chat "Refactor this function and update the callers"
Expected speed on RTX 3090: 30-40 tokens/sec.
2. gpt-oss 20B (Q5_K_M) — best for OpenClaw production
OpenAI’s 20B at Q5 uses about 15 GB. Cleanest tool-call JSON of any open model — exactly what OpenClaw autonomous loops need.
ollama pull gpt-oss:20b-q5_K_M openclaw config set agents.defaults.models.chat ollama/gpt-oss:20b-q5_K_M openclaw run --agent "Implement the spec end-to-end"
3. Qwen 3.6 35B-A3B (IQ4_XS) — fastest
Mixture-of-Experts variant of Qwen 3.6. 35B total params, 3B active per token. At IQ4_XS uses about 19 GB. Inference is 8B-class speed (~50 tok/sec on RTX 3090).
ollama pull qwen3.6:35b-iq4_xs
4. Nemotron Cascade 2 30B (Q4_K_M) — NVIDIA’s late-March 2026 release
30B dense, 256K context, strong on structured output. About 18 GB at Q4_K_M.
5. Mistral Small 3 22B (Q5_K_M) — alternative
About 16 GB at Q5. Good for European-language workloads, slightly weaker on code than Qwen 3.6.
What Fits in 24 GB VRAM
| Model | Quant | VRAM | Tok/sec |
|---|---|---|---|
| Qwen 3.6 27B | Q4_K_M | ~17 GB | 30-40 |
| Qwen 3.6 35B-A3B (MoE) | IQ4_XS | ~19 GB | 45-55 |
| gpt-oss 20B | Q5_K_M | ~15 GB | 40-50 |
| Nemotron Cascade 2 30B | Q4_K_M | ~18 GB | 28-35 |
| Qwen 3.5 9B | Q8_0 | ~10 GB | 60-80 |
| Llama 3.3 70B | IQ2_XS | ~19 GB | 8-12 (degraded) |
OpenClaw Setup on RTX 3090
# 1. Pull Qwen 3.6 27B ollama pull qwen3.6:27b # 2. Wire it in openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b # 3. Use 32K context (24GB has the headroom) openclaw config set agents.defaults.context_limit 32768 # 4. For autonomous runs, prefer gpt-oss 20B (more reliable) openclaw config set agents.defaults.models.agent ollama/gpt-oss:20b-q5_K_M # 5. Smoke test openclaw chat "List the three largest files in my home directory"
Common Mistakes on RTX 3090
- Trying to run Llama 3.3 70B at IQ2. It technically fits but quality collapses. Qwen 3.6 27B at Q4 beats it on every benchmark.
- Maxing context to 128K. KV cache eats VRAM fast — at 128K with a 27B Q4 model, you’ll OOM before you fill the context. Cap at 32K, raise selectively.
- Picking Qwen 3.5 27B for OpenClaw. Tool-calling bug in Ollama (GitHub issue #14493). Always use Qwen 3.6 27B.
- Ignoring power supply headroom. RTX 3090 pulls 350W under sustained inference. Make sure your PSU has 100W+ headroom or it’ll throttle / shut down on long runs.
🛒 Mac alternative for the same workload
Don't want to build a GPU rig? Apple Silicon delivers equivalent local-AI capability with unified memory and zero ops overhead.
Amazon affiliate links — we earn a small commission at no cost to you.
See Also
- Best Local LLM for RTX 4090 → — same VRAM, faster bandwidth
- Best Local LLM for RTX 5090 → — 32GB step up
- Best Local LLM by GPU (hub)
- Best Local LLM by RAM (hub)
Get guides like this in your inbox every Wednesday.
No spam. Unsubscribe anytime.
You'll probably need this again.
Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.
Need help with your OpenClaw setup?
We do remote setup, troubleshooting, and training worldwide.
Book a Call