How much VRAM do I need for a local LLM?

8GB VRAM can run small 7B to 8B models. 16GB VRAM can run stronger 14B to 27B quantized models. 24GB VRAM is a practical sweet spot for local coding and agent workflows. Larger 70B models generally need 40GB or more.

Can I run a local LLM without a GPU?

Yes, but it is slower. CPU-only local LLMs are acceptable for testing and small models. For OpenClaw agent workflows, use Apple Silicon unified memory or a GPU with enough VRAM whenever possible.

Local LLM hardware check

Can my computer run a local LLM?

Q: Can my computer run a local LLM?

Most modern computers can run a small local LLM, but practical agent work needs more memory. 16GB RAM is the minimum, 24GB is the first useful tier, 32GB to 48GB is comfortable, and 64GB or more is where large local models become realistic.

Short answer: probably, but the useful cutoff is higher than most people expect. You can test tiny models on 8GB to 16GB, but reliable OpenClaw agent work starts around 24GB of usable memory and gets much better at 32GB to 64GB.

Use the calculator Enter RAM + VRAM and get exact model recommendations. Compare RAM tiers See the hub for every RAM tier from 8GB to 128GB. 32GB vs 64GB? Decide whether the 64GB upgrade is worth it for local LLMs and OpenClaw. 64GB vs 128GB? Pick the serious-work tier or the power-user tier for local AI. Compare GPUs Pick by RTX 3090, 4090, 5090, A6000, or Apple Silicon. Already running slow? Diagnose CPU fallback, swap, context length, quantization, and OpenClaw tool-loop latency.

The practical thresholds

Your hardware	Answer	Realistic model range	Guide
8GB RAM	Technically yes, practically no	3B to 7B Q4	Open guide
16GB RAM	Entry tier	8B to 14B Q4	Open guide
24GB RAM / VRAM	First practical tier	14B to 27B Q4	Open guide
32GB RAM	Good local agent tier	27B Q4/Q6	Open guide
48GB RAM	Comfortable tier	27B Q8 or 35B MoE	Open guide
64GB RAM	Large-model tier	70B Q4 or gpt-oss 120B Q4	Open guide
96GB RAM	High-end local tier	120B class Q5 or 122B MoE	Open guide
128GB RAM	Power-user tier	120B Q6/Q8 and larger MoE setups	Open guide

If you have an Apple Silicon Mac

Treat unified memory like shared RAM/VRAM. A 24GB MacBook can run useful 14B to 27B quantized models. 32GB to 64GB is much better for OpenClaw because context and tool calls add memory pressure.

If you have an NVIDIA GPU

VRAM is the binding constraint. 8GB is small-model territory, 16GB starts to get useful, and 24GB cards like an RTX 3090 or 4090 are the consumer sweet spot.

If you are CPU-only

You can run local models, but expect slower responses. CPU-only is fine for testing privacy or offline workflows. For daily agent work, use a GPU, Apple Silicon, or a cloud API fallback.

Best next step

Use the calculator first. If the answer is borderline, open the matching RAM guide and check the recommended quantization before buying hardware or changing your OpenClaw config.

Open local LLM calculator See model picks Fix slow local models

Quick answers

Can I run Llama locally?

Yes. Small Llama models run on modest hardware. Llama 3.3 70B needs far more memory; use the 64GB Llama 3.3 guide if that is your target.

Can I run Qwen locally?

Yes. Qwen is one of the better local choices for agent workflows. For the common 16GB VRAM case, start with the Qwen 3.5 27B on 16GB VRAM guide.

What if I only care about privacy?

A small local model may be enough for private drafting, search, or summarization. OpenClaw-style agent work needs stronger tool-calling reliability, so do not judge by whether a model merely starts.