How Much Context Fits in 128GB RAM for a Local LLM?
On a 128GB local LLM machine, do not budget all 128GB for model weights. Reserve memory for the OS, OpenClaw, Ollama or llama.cpp, KV cache, browser/tools, and swap safety. For stable agent work, treat 90-105GB as the practical model-weight budget and cap context before raising it.
Direct Answer
A 128GB local LLM machine is a power-user tier, but it is not a blank 128GB bucket for model weights.
For OpenClaw, use this operating budget:
| Budget line | Reserve |
|---|---|
| macOS, Linux, drivers, browser, editor, shell tools | 8-16GB |
| OpenClaw, Ollama or llama.cpp, file tools, local services | 4-8GB |
| KV cache and context growth | 8-24GB for normal work, more for long context |
| Safety headroom to avoid swap | 8-12GB |
| Practical model-weight budget | about 90-105GB |
If a model’s weights use 95GB, it can fit, but it does not leave much room for a giant context window. If you want stable autonomous agent runs, start conservative and raise context after the workload proves it stays out of swap.
The Simple Formula
Use this before changing context:
usable_memory = installed_memory - os_reserve - process_reserve - safety_headroom model_budget = usable_memory - kv_cache_budget
For a 128GB Mac or unified-memory workstation:
- Installed memory: 128GB
- Conservative reserve: 24-32GB
- Practical model + context budget: 96-104GB
- Safe model-weight target for agent work: 90-105GB
That is why two setups can both “fit” but behave very differently:
- 70B model at a practical quant with 32K or 64K context: comfortable.
- 100B-120B model at a heavier quant with 64K or 128K context: possible to fit, but easy to push into swap.
Context Budget by Workload
| Workload | Starting context | Why |
|---|---|---|
| Chat and short coding tasks | 16K-32K | Enough for prompt, tool output, and small diffs. Lowest memory risk. |
| OpenClaw agent loops | 32K | Good first setting because tool results and file reads grow fast. |
| Repository analysis | 32K-64K | Use retrieval or chunking before jumping to 128K. |
| Long PDFs or full transcripts | 64K+ | Works best with smaller models or lots of headroom. |
| 100B-class model experiments | 16K-32K | Keep context tight because weights already consume most of the budget. |
The operational rule: raise context only after confirming the model is not swapping. A slow local model is often a memory-pressure problem, not a model-quality problem.
What 128GB Is Good For
128GB is strongest when you need one of these:
- A high-quality 70B-class model with comfortable context.
- A 100B-class model at a practical quantization level.
- Two smaller models kept warm for routing, such as fast chat plus heavier agent model.
- Long-running OpenClaw jobs where swap would silently destroy throughput.
- A private local AI host for a team or home lab.
It is not automatically better if your actual workload is one 20B-30B model. In that case, 48GB or 64GB may be the better spend.
Safe OpenClaw Starting Config
Start with a moderate context cap, then increase once the workload is stable:
openclaw config set agents.defaults.context_limit 32768 openclaw config set agents.defaults.keep_alive 30m openclaw models status
If memory stays below pressure during real runs, test 64K:
openclaw config set agents.defaults.context_limit 65536 openclaw run --agent "Analyze this repository and propose the smallest safe change"
If the machine swaps, reduce context before changing models. Context is the easiest knob to fix.
128GB vs 64GB
Use 128GB if you are doing serious local AI work: 70B+ models, multiple loaded models, large-document workflows, or private team hosting.
Use 64GB if you mostly run one efficient model, value lower hardware cost, and can keep context moderate.
The buying decision page is here: 64GB vs 128GB RAM for local LLMs.
Quick Troubleshooting
If a 128GB host is slow or unstable:
- Lower context from 128K to 32K.
- Unload secondary models.
- Check whether the process is using CPU fallback.
- Leave at least 8-12GB free during the run.
- Prefer one stable model over a multi-model setup until the workload is repeatable.
For a broader diagnostic, use Why Is My Local LLM So Slow?.
See Also
- OpenClaw Local Model Calculator
- Can I Run a Local LLM With 128GB RAM and No GPU?
- Best Local LLMs for 128GB RAM
- Can My Computer Run a Local LLM?
- Best Local LLM by RAM
- 64GB vs 128GB RAM for Local LLMs
Get guides like this in your inbox every Wednesday.
No spam. Unsubscribe anytime.
You'll probably need this again.
Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.
Need OpenClaw fixed live?
Remote rescue sessions for gateway, auth, tunnel, VPS, and model access problems.
See Rescue Session