What is the best local LLM for an RTX 3090?

Qwen 3.6 27B at Q4_K_M is the best general-purpose pick for an RTX 3090. It uses 17 GB VRAM with 32K context, leaving headroom for the OS. Speed is approximately 35 tokens/sec. For OpenClaw production use, gpt-oss 20B at Q5_K_M is the safer pick because its JSON tool-call output is cleaner.

Is the RTX 3090 still worth buying for LLMs in 2026?

Yes — used. New 3090s are scarce, but used ones sell for $600-800 on eBay vs $1,500+ for an RTX 4090. For 24GB-or-less models, the 3090 gives you 70-80% of the 4090's inference speed at half the cost. The 4090 wins on raw bandwidth (1008 vs 936 GB/s) but the gap is smaller than the price gap. The 5090 with 32GB VRAM is the upgrade if you want to step past 24GB workloads.

Can the RTX 3090 run Llama 3.3 70B?

Only at degraded quants. Llama 3.3 70B at Q3_K_S uses about 28 GB — too much for a single 3090. At IQ2_XS (about 19 GB), it fits but quality is significantly degraded. For 70B-class models on a single 24 GB GPU, prefer Qwen 3 70B at IQ3 or just stick with 27-32B class models at higher quants. Two 3090s in NVLink/PCIe is the right move if you need 70B at usable quants.

← Back to Blog

Hardware May 18, 2026

Best Local LLM for RTX 3090 (2026): 24GB VRAM Picks + OpenClaw Setup

The RTX 3090 is still the best value GPU for local LLMs in 2026. 24 GB VRAM at 936 GB/s memory bandwidth runs Qwen 3.6 27B at Q4 comfortably with ~35 tokens/sec. Used 3090s on eBay sell for $600-800 — about half what a 4090 costs, with 90% of the LLM throughput on 24GB workloads.

RTX 3090 sitting idle? Turn it into an OpenClaw host.

See our AI training options. We'll get OpenClaw routing all your AI to local Ollama on the 3090, free.

Bottom Line

Best overall pick: Qwen 3.6 27B at Q4_K_M (~35 tok/sec)
Best for OpenClaw production: gpt-oss 20B at Q5_K_M (cleanest tool calls)
Best fast pick: Qwen 3.6 35B-A3B at IQ4_XS (MoE — ~50 tok/sec, 3B active params)
Skip: Llama 70B at any quant on a single 3090

Watch: Qwen 3.6 27B Coding on Local Hardware (Full Uncut)

Qwen 3.6 27B Q4 is the top pick below, and it fits the 3090 with room to spare. Here it is running a full, unedited local coding session so you can see real tokens/sec and tool-call behavior before you buy the card.

Top Picks for RTX 3090 (24 GB VRAM)

1. Qwen 3.6 27B (Q4_K_M) — best overall

The April 22, 2026 release fits perfectly on the 3090. About 17 GB VRAM at Q4_K_M with 32K context. Outperforms the 397B Qwen 3.5 MoE on agentic coding benchmarks (77.2 SWE-Bench Verified).

ollama pull qwen3.6:27b

openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b
openclaw chat "Refactor this function and update the callers"

Expected speed on RTX 3090: 30-40 tokens/sec.

2. gpt-oss 20B (Q5_K_M) — best for OpenClaw production

OpenAI’s 20B at Q5 uses about 15 GB. Cleanest tool-call JSON of any open model — exactly what OpenClaw autonomous loops need.

ollama pull gpt-oss:20b-q5_K_M
openclaw config set agents.defaults.models.chat ollama/gpt-oss:20b-q5_K_M
openclaw run --agent "Implement the spec end-to-end"

3. Qwen 3.6 35B-A3B (IQ4_XS) — fastest

Mixture-of-Experts variant of Qwen 3.6. 35B total params, 3B active per token. At IQ4_XS uses about 19 GB. Inference is 8B-class speed (~50 tok/sec on RTX 3090).

ollama pull qwen3.6:35b-iq4_xs

4. Nemotron Cascade 2 30B (Q4_K_M) — NVIDIA’s late-March 2026 release

30B dense, 256K context, strong on structured output. About 18 GB at Q4_K_M.

5. Mistral Small 3 22B (Q5_K_M) — alternative

About 16 GB at Q5. Good for European-language workloads, slightly weaker on code than Qwen 3.6.

What Fits in 24 GB VRAM

Model	Quant	VRAM	Tok/sec
Qwen 3.6 27B	Q4_K_M	~17 GB	30-40
Qwen 3.6 35B-A3B (MoE)	IQ4_XS	~19 GB	45-55
gpt-oss 20B	Q5_K_M	~15 GB	40-50
Nemotron Cascade 2 30B	Q4_K_M	~18 GB	28-35
Qwen 3.5 9B	Q8_0	~10 GB	60-80
Llama 3.3 70B	IQ2_XS	~19 GB	8-12 (degraded)

OpenClaw Setup on RTX 3090

# 1. Pull Qwen 3.6 27B
ollama pull qwen3.6:27b

# 2. Wire it in
openclaw config set agents.defaults.models.chat ollama/qwen3.6:27b

# 3. Use 32K context (24GB has the headroom)
openclaw config set agents.defaults.context_limit 32768

# 4. For autonomous runs, prefer gpt-oss 20B (more reliable)
openclaw config set agents.defaults.models.agent ollama/gpt-oss:20b-q5_K_M

# 5. Smoke test
openclaw chat "List the three largest files in my home directory"

Common Mistakes on RTX 3090

Trying to run Llama 3.3 70B at IQ2. It technically fits but quality collapses. Qwen 3.6 27B at Q4 beats it on every benchmark. See the RTX 3090 70B guide if that is the exact question.
Maxing context to 128K. KV cache eats VRAM fast — at 128K with a 27B Q4 model, you’ll OOM before you fill the context. Cap at 32K, raise selectively.
Picking Qwen 3.5 27B for OpenClaw. Tool-calling bug in Ollama (GitHub issue #14493). Always use Qwen 3.6 27B.
Ignoring power supply headroom. RTX 3090 pulls 350W under sustained inference. Make sure your PSU has 100W+ headroom or it’ll throttle / shut down on long runs.

🎮 BUILDING THE 3090 RIG?

A 24 GB RTX 3090 is still the value pick for Qwen 3.6 27B at Q4 (~35 tok/sec). Need to step past 24 GB later? The 96 GB RTX PRO 6000 Blackwell is the workstation jump for 70B-plus at long context.

3090EVGA RTX 3090 24 GB ↗ 96GBRTX PRO 6000 Blackwell 96 GB ↗