Why is my OpenClaw bill so high?

Most OpenClaw bills are inflated by three things: context window accumulation (conversation history grows 13x after five turns), background heartbeat polling that sends full context every cycle, and using expensive models like Claude Opus or GPT-4 for simple tasks that a cheap model handles fine. Fixing these three issues alone cuts most bills by 80%.

How do I reduce OpenClaw API costs?

Apply seven changes: enable model routing to use cheap models for simple tasks, turn on prompt caching, reduce heartbeat frequency, reset sessions regularly, disable background features like title generation, use QMD semantic search instead of full history, and offload simple tasks to a local model via Ollama. Together these changes can reduce a $600/month bill to under $20.

What is model routing in OpenClaw?

Model routing lets OpenClaw automatically choose between an expensive model (like Claude Sonnet or GPT-4) for complex tasks and a cheap model (like Haiku or GPT-4o-mini) for simple ones. Since 80-90% of agent interactions are simple classification, formatting, or acknowledgment tasks, routing most traffic to the cheap model saves 80-90% on API costs.

Can I run OpenClaw for free with local models?

Yes. OpenClaw supports Ollama for local model inference. Models like Llama 3, Mistral, and Phi-3 run entirely on your hardware with zero API cost. You can route simple tasks to Ollama and reserve cloud models for complex reasoning, or run everything locally if your hardware supports it.

What is QMD semantic search in OpenClaw?

QMD (Query-Matched Documents) is a context strategy that uses semantic search to pull only relevant conversation history into the prompt instead of sending the full chat log. This reduces input tokens by 60-97% depending on conversation length, dramatically cutting costs on long-running sessions.

← Back to Blog

Guide April 6, 2026

OpenClaw Costs $600/Month? Here's How to Get It Under $20 | OpenClaw DC

If your OpenClaw API bill hit $300-600/month, you're not alone. Most of that spend is wasted on bloated context windows, background heartbeats, and using expensive models for simple tasks. Seven changes can cut your bill by 90% or more. One user went from $600/month to under $20. Here is exactly how.

5 OpenClaw Mistakes Costing You Money Right Now

Heartbeat fix, model routing, session resets — cut $36K/yr to $5-10K

WATCH →

If your OpenClaw API bill hit $300-600/month, you are not alone. Most of that spend is wasted on bloated context windows, background heartbeats, and using expensive models for simple tasks. Seven changes can cut your bill by 90% or more. One user went from $600/month to under $20. Here is exactly how.

TL;DR: Your OpenClaw bill is high because context windows balloon, heartbeats fire too often, and every task hits your most expensive model. Fix these seven things in order: model routing, prompt caching, heartbeat optimization, session resets, disabling background features, QMD semantic search, and local model offloading. Each fix stacks. Total savings: 90-97%.

Where Your Money Is Actually Going

Before changing anything, you need to understand what is eating your budget. Run /usage in your OpenClaw session right now. Look at the token breakdown. Almost every high bill traces back to four problems.

Context accumulation. Every message in a conversation gets sent back to the model as context. After five turns, your context window is 13x larger than your first message. After twenty turns, you are sending a novel-length prompt for every single interaction. One user traced their token usage and found that a single 30-turn debugging session consumed more tokens than their first two weeks of usage combined. This is the number one cost driver and the one most people miss.

Background heartbeats. OpenClaw’s heartbeat polls your task list on a timer. Each heartbeat sends the full system prompt plus recent context to the model. If your heartbeat fires every 30 seconds, that is 2,880 API calls per day, each carrying your full context window.

Bloated tool outputs. When OpenClaw runs a skill or tool, the raw output gets appended to context. A single file read or web scrape can dump thousands of tokens into your conversation history, inflating every subsequent API call.

Wrong model for the job. Most OpenClaw interactions are simple: classifying a message, formatting a response, acknowledging a task completion. Sending these to Claude Sonnet or GPT-4 when Haiku or GPT-4o-mini handles them perfectly is burning money for zero quality improvement. In testing, Haiku produces identical results to Sonnet on over 85% of typical agent tasks. You are paying 10-20x more for the same output.

The 7 Fixes (In Order of Impact)

1. Switch to Model Routing

Savings: 80-90%. Model routing sends simple tasks to a cheap model and complex tasks to an expensive one. Since the vast majority of agent interactions are simple classification or formatting, this one change cuts most bills dramatically.

# config.yaml
model_routing:
  enabled: true
  default_model: "haiku"
  complex_model: "sonnet"
  complexity_threshold: 0.7

Simple acknowledgments, task classification, and status checks go to Haiku at a fraction of the cost. Only multi-step reasoning, code generation, and creative tasks escalate to Sonnet. This single change is responsible for the largest cost reduction in most setups. Start here.

2. Enable Prompt Caching

Savings: 80-90% on input tokens. Prompt caching keeps your system prompt and static context in the provider’s cache. Subsequent calls reference the cache instead of re-sending the full prompt.

# config.yaml
prompt_caching:
  enabled: true
  cache_system_prompt: true
  cache_tool_definitions: true

If you are using Claude via the Anthropic API, prompt caching is available natively. Your system prompt (often 2,000-4,000 tokens) gets sent once and cached. Every subsequent call pays a fraction of the input cost for that portion.

3. Optimize Heartbeat Schedule

Savings: 60-80%. The default heartbeat interval is aggressive. Most users do not need their agent polling every 30 seconds. Reducing the frequency and using a cheaper model for heartbeat checks cuts a massive chunk of background spend.

# config.yaml
heartbeat:
  interval_seconds: 300
  model: "haiku"
  slim_context: true

Setting slim_context: true strips conversation history from heartbeat calls. The heartbeat only needs to check the task queue, not remember the full conversation.

4. Reset Sessions Regularly

Savings: 50-70%. Context grows with every turn. Resetting your session clears the accumulated history and starts fresh. This is the simplest fix and one of the most effective.

# config.yaml
session:
  auto_reset_after_turns: 10
  auto_reset_after_minutes: 30
  preserve_task_list: true

With preserve_task_list: true, your queued tasks survive the reset. Only the conversation history clears. You lose nothing functional and your next API call goes from 50,000 tokens of context back down to 3,000.

5. Disable Background Features

Savings: 60-80%. OpenClaw generates titles for conversations, auto-tags messages, and runs autocomplete suggestions. Each of these fires a separate API call. If you are paying per token, these background features add up fast.

# config.yaml
background_features:
  title_generation: false
  tag_generation: false
  autocomplete: false
  auto_summarize: false

Disabling these does not affect core functionality. Your agent still processes tasks, responds to messages, and runs skills. You just lose cosmetic features that were silently burning your budget.

6. Use QMD for Context

Savings: 60-97%. QMD (Query-Matched Documents) replaces full conversation history with semantic search. Instead of sending every previous message as context, QMD pulls only the messages relevant to the current query.

# config.yaml
context_strategy:
  mode: "qmd"
  max_results: 5
  similarity_threshold: 0.75

On a 50-turn conversation, full history might send 80,000 tokens of context. QMD sends 2,000-5,000 tokens of the most relevant history. The model gets better context (less noise) and you pay for a fraction of the tokens.

7. Offload to Ollama for Simple Tasks

Savings: 100% on offloaded tasks. Ollama runs open-source models locally on your machine. For tasks that do not need frontier-model intelligence, local processing costs nothing beyond electricity.

# config.yaml
ollama:
  enabled: true
  model: "llama3"
  endpoint: "http://localhost:11434"
  offload_tasks:
    - "message_classification"
    - "status_checks"
    - "simple_formatting"
    - "task_acknowledgment"

Pair this with model routing. Simple tasks go to Ollama (free). Medium tasks go to Haiku (cheap). Complex tasks go to Sonnet (expensive but rare). Your bill reflects only the small percentage of tasks that actually need a frontier model. See our guide on the best local models for OpenClaw for hardware requirements and model recommendations.

Before and After

Cost Driver	Before	After	Savings
Model (all tasks on Sonnet)	$250	$25 (routing to Haiku)	90%
Input tokens (no caching)	$120	$15 (prompt caching)	87%
Heartbeats (every 30s, full context)	$90	$5 (5min, slim, Haiku)	94%
Context growth (no resets)	$60	$8 (auto-reset at 10 turns)	87%
Background features	$40	$0 (disabled)	100%
Full history context	$30	$2 (QMD)	93%
Simple tasks on cloud	$10	$0 (Ollama)	100%
Total	$600	$55	91%

Applying all seven fixes together with aggressive settings pushes the total under $20. The user who hit that number ran Ollama for all simple and medium tasks, reserved Sonnet for complex reasoning only, and kept sessions under five turns. Your exact number will depend on usage volume, but the percentage savings hold regardless of scale. A $300/month user applying the same fixes lands around $10-15.

Try This Now

Run /usage in your next OpenClaw session. Note the token count. Apply fix #1 (model routing) by adding the config above. Run /usage again after a few interactions. Compare the numbers. Most users see an immediate 50%+ drop from this single change.

Next Steps

If your bill is still higher than expected after these changes, dig deeper:

How to cut OpenClaw API costs covers additional strategies including Claude native format conversion
Setting up spending limits in OpenClaw prevents surprise bills with hard caps
Best local models for OpenClaw helps you pick the right Ollama model for your hardware
The complete OpenClaw costs guide breaks down pricing across every supported provider

Nobody should be paying $600/month for an open-source agent. The defaults ship optimized for capability, not cost. Every feature is turned on, every call hits the best model, and context accumulates without limit. That is great for getting started quickly but terrible for your wallet. Thirty minutes of configuration changes that, and you can run OpenClaw for the price of a coffee.

Need help optimizing your setup? We configure OpenClaw deployments for teams and individuals in the DC, Maryland, and Virginia area. Book a call and we will walk through your specific usage patterns.

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need help with your OpenClaw setup?

We do remote setup, troubleshooting, and training worldwide.

Book a Call