How do I actually measure what Claude Code is costing me?

Install ccusage: run npx ccusage@latest. It reads your local Claude Code session logs and shows today's spend, 5-hour block remaining, and burn rate. You can pipe it into your statusline so the number is always visible. The repository (ryoppippi/ccusage) has 11,400 stars and is the measurement tool everyone uses before optimizing. You cannot cut costs you have not measured.

What is the claude -p billing trap?

The claude -p print-mode flag silently bills via API even when you have a paid Max plan. One developer documented in GitHub issue #37686 that an overnight claude -p run on a $200 Max plan produced an $1,800 API bill. Always check which authentication mode is active before using -p. If you are on a subscription, avoid -p entirely unless you explicitly want API billing.

What is Caveman Mode and how much does it save?

Caveman Mode (also called Kevin Mode) is a system prompt that strips filler, removes preambles, bans rule-of-three lists, and forces terse output. Users report 61-75% output token reductions. The juliusbrussee/caveman GitHub repo has 5,000+ stars and Decrypt and PCWorld both covered it in April 2026. At daily-user volumes it typically saves $100-140 per month with no loss of capability.

What is the Advisor pattern?

Advisor is an official Anthropic-endorsed pattern: use Opus as a planner/advisor (one call per task) and Sonnet or Haiku as the executor that does the actual work. Anthropic's own benchmark shows +2.7 percent SWE-bench Multilingual while costing 11.9 percent less. The @claudeai announcement post had 4.6M views and 38K likes. For a daily user this is a compounding 10-20 percent saving with a small quality gain.

Why should I restart Claude sessions at 200K tokens?

Every message in a Claude Code session resends the full conversation history as context. Past 200K tokens the marginal cost per message grows fast — some users report 300K+ sessions cost 10-15x more than the same work split across fresh sessions. The pattern: when your session crosses 200K tokens, write a handoff document of what is done and what is next, run /new, paste the handoff, continue. One user reported an 80 percent monthly drop from this change alone.

What is ENABLE_TOOL_SEARCH and should I turn it on?

ENABLE_TOOL_SEARCH is a Claude Code setting that lazy-loads tool definitions instead of injecting every tool schema into each turn. With it off, a default Claude Code session burns roughly 45K tokens per turn on tool definitions alone. Adding it to settings.json drops per-turn context to around 20K tokens, a 55 percent reduction. For OpenClaw users routing requests to Claude, enabling it typically saves $50-100 per month on API costs with no downside.

Can I run OpenClaw for free?

Yes. Install OpenClaw on your existing computer, run Ollama with Qwen 3.6 or GLM-5.1, and pay zero in API fees. Qwen3.6-Plus is also available free on OpenRouter as a one-env-var swap from Opus. The tradeoff is slower inference than cloud APIs and slightly lower capability on complex reasoning tasks, but for most automations the free stack is genuinely sufficient.

How much does OpenClaw cost for a team?

A team running OpenClaw with optimized settings (ENABLE_TOOL_SEARCH, advisor pattern, caveman mode, ccusage monitoring) typically lands at $80-200 per month total even with 10+ developers, because the per-developer cost caps out around $15-25 once the fixes are applied. Unoptimized, the same team can burn $3,000-10,000 per month.

How do I hard-cap a single Claude Code invocation?

Use the --max-budget-usd flag on the claude CLI, or set workspace rate limits in the Anthropic Console. Both prevent a single runaway agent loop from destroying your budget. Combine with ccusage for visibility — you want both a soft warning (ccusage in statusline) and a hard cap (--max-budget-usd).

← Back to Blog

Guide April 20, 2026

OpenClaw Costs: How I Went From $1,600/mo to $180/mo (10 Fixes That Actually Worked)

Same developer. Same stack. Two weeks apart. Before: $1,600/month, Opus on every call, 300K context by lunchtime. After: $180/month, zero capability lost. Here is every fix that moved the number, ranked by savings.

Same developer. Same stack. Two weeks apart.

Before: $1,600 per month. Opus on every call. Sessions that ballooned to 300K tokens by lunchtime. One particularly bad night on claude -p that almost cracked $500 in six hours.

After: $180 per month. Zero capability lost. Most days he forgets to check the bill because it just does not move.

Every fix below is ranked by how much it saved him. The first three recover bills people did not know they were bleeding. The last three are advanced moves that compound once the basics are in place.

5 OpenClaw Mistakes Costing You Money Right Now

Video walkthrough — heartbeat, subscription vs API, model routing, session resets, cloud scheduling.

Try First60 seconds

Estimate your bill and see the optimized version.

Plug in team size, sessions per day, model mix. Compare current vs all-ten-fixes-applied.

Open Cost Calculator →

DEEP-DIVE GUIDES

Hitting one of these specifically? We've written a focused, exact-fix guide for the biggest ones:

Step 0: Measure Before You Optimize (ccusage)

You cannot cut what you cannot see. Before any fix, install ccusage:

npx ccusage@latest

It reads your local Claude Code session logs and prints today’s spend, the 5-hour block remaining, and your burn rate. ryoppippi/ccusage crossed 11,400 GitHub stars in April 2026 and is the measurement tool everyone uses before optimizing.

Pipe it into your statusline so the number is visible on every prompt. Once the dollar amount is in your peripheral vision, usage drops 20-25 percent on its own — behavior change, no code change.

“Didn’t realize I was pushing $40/day until I put ccusage in my statusline. Saw the number, changed my habits, cut it to $8/day in a week.” — @meta_alchemist

Install ccusage before running any of the fixes below. The point of optimizing is to see the delta.

1. The `claude -p` trap (up to $1,800 in one night)

This is the fix that saves catastrophic bills, not incremental ones.

The claude -p (print mode) flag silently bills via the API even when you have a paid Max or Pro plan. GitHub issue #37686 documents one developer who ran claude -p overnight on a $200/month Max plan and woke up to an $1,800 API bill. It is not a bug in the traditional sense — the flag is documented to use API mode — but almost nobody reads that line.

Check which mode you are in before every -p call:

claude config get auth.mode
# Returns: "subscription" or "api"

The rule: if you are on a subscription, never use -p unless you explicitly want API billing. For scripted workflows, use the interactive mode through a pseudo-tty or switch to the plan API billing dashboard first so you at least have a cap.

“Worst night of my life. $1,800 before I saw the email. The flag name is -p. It should be --use-my-credit-card-now.” — zephyr_z9 (reporting a similar incident; post cleared 195K views)

2. Caveman Mode — 61-75% output token reduction

The defining cost-saver of April 2026. juliusbrussee/caveman is a single CLAUDE.md / system-prompt snippet that strips filler, bans preambles, removes rule-of-three lists, and forces terse output. Output token volume drops 61-75 percent with no loss of code quality.

Add this to your CLAUDE.md:

# Response rules

- No preamble. No "I'll help you with that."
- No summaries of what you did unless asked.
- No rule-of-three. Give the one best answer.
- No filler adverbs (basically, actually, essentially).
- Code first. Explanation only if requested.
- If the task has a single answer, give only that answer.

A daily Claude Code user at Sonnet rates reports typical savings of $100-140/month from this alone. It has 5,000+ GitHub stars, 10K+ upvotes on r/ClaudeAI, and coverage in Decrypt and PCWorld.

Combine with the ENABLE_TOOL_SEARCH fix (#3) and you have already recovered most of what a fresh session bills.

3. ENABLE_TOOL_SEARCH — the 45K-token tool-schema tax

By default, Claude Code injects every registered tool definition into every single turn. That averages 45,000 tokens per turn just for tool descriptions. A 926-session audit over 33 days totaled $1,619 — most of it was this.

One-line fix in ~/.claude/settings.json:

{
  "env": {
    "ENABLE_TOOL_SEARCH": "true"
  }
}

With the flag on, tool definitions lazy-load: the model searches for tools it needs and loads only those. Per-turn context drops from ~45K to ~20K tokens, a 55 percent cut on the tool-schema portion of your prompt. Typical savings: $50-100/month at Sonnet rates.

OpenClaw users: OpenClaw does not yet expose an identical flag under its own config, but any agent you wire to Claude through the OpenClaw gateway inherits the underlying Claude Code settings. Edit ~/.claude/settings.json, add the env block, restart the gateway with openclaw gateway restart, and check the next session’s token count.

4. Restart sessions at 200K tokens (never go past 300K)

Every message in a Claude Code session resends the full conversation history as context. Past 200K tokens, the marginal cost per message grows fast. Past 300K, you are paying 10-15x more for the same work compared to splitting across fresh sessions.

The pattern:

When your session crosses 200K (ccusage shows this), stop.
Ask Claude to write a handoff doc: what is done, what is next, relevant file paths, outstanding questions.
Save the handoff, run /new, paste it, continue.

One user on buildtolaunch documented an 80 percent monthly drop from this single behavior change after his $1,600/month postmortem. Restart discipline is the highest-ROI behavior shift.

“Stopped letting sessions run past 200K. Handoff doc takes 30 seconds. Monthly bill went from $1,600 to $320. Same work. Same quality.” — paraphrased from the buildtolaunch writeup

5. The Advisor pattern (Opus + Sonnet/Haiku split)

An official Anthropic-endorsed pattern: use Opus as the advisor (one call per task, plans the approach) and Sonnet or Haiku as the executor (does the actual work). Anthropic’s own benchmark shows +2.7% SWE-bench Multilingual while costing 11.9% less.

The @claudeai announcement post went 4.6M views / 38K likes. Set it up in settings.json:

{
  "models": {
    "planner": "claude-opus-4-6",
    "executor": "claude-sonnet-4-6",
    "reviewer": "claude-haiku-4-5"
  },
  "routing": {
    "plan_first": true,
    "reviewer_only_on_commit": true
  }
}

For a daily user, this compounds into a 10-20 percent saving with a small quality gain — the planner catches architectural issues before the executor wastes tokens coding the wrong thing.

6. Audit your MCP servers (2-5K tokens each, per turn)

Every MCP server you enable adds 2-5K tokens per turn in tool definitions, even if you never use that server in the current session. Three or four servers = 10-18K tokens of overhead before your prompt even starts.

Audit command:

claude mcp list
# Then for each one you don't use daily:
claude mcp remove <server-name>

Also add a .claudeignore at the repo root to skip huge dirs (build output, node_modules, generated code) from the codebase context.

node_modules/
dist/
build/
coverage/
*.min.js
*.generated.ts

Typical recovery: 10-20 percent of your context budget, which compounds across every turn.

7. Advanced settings.json (compactThreshold + subagentModel)

A viral @LLMJunky post (254K views) shared a config block that layered on top of ENABLE_TOOL_SEARCH:

{
  "env": { "ENABLE_TOOL_SEARCH": "true" },
  "compactThreshold": 200000,
  "subagentModel": "claude-haiku-4-5",
  "autoCompact": true
}

compactThreshold: 200000 forces context summarization at 200K tokens instead of the default 300K+. subagentModel: claude-haiku-4-5 routes spawned subagents (for sub-tasks like “search this codebase”) to Haiku instead of your primary model. Reports: 30-50 percent reduction on top of the earlier fixes.

8. Qwen 3.6-Plus free on OpenRouter ($200/mo → $0 for a lot of tasks)

You do not always need Opus. @RoundtableSpace posted an env-var swap that routes a subset of your work to Qwen3.6-Plus on OpenRouter’s free tier:

export ANTHROPIC_BASE_URL=https://openrouter.ai/api
export ANTHROPIC_API_KEY=<your-openrouter-key>
export ANTHROPIC_MODEL=qwen/qwen-3.6-plus

Qwen3.6-Plus has a 1M-token context window and is free on OpenRouter’s free tier (rate-limited, but fine for everything except intense bursts). Use it for drafting, research, summarization. Keep Claude Opus/Sonnet for mission-critical code. For the right mix of tasks, this replaces $100-200/month of API spend with $0.

See our Qwen 3.5 27B on a single RTX 3090 benchmark for the case that smaller Qwen models also beat larger MoE rigs on agent coding.

9. Hard-cap invocations with `--max-budget-usd`

After everything above, you still want a circuit breaker. A runaway agent loop can still spike if a prompt goes wrong.

claude --max-budget-usd 5.00 --agent ship-feature

The flag aborts the invocation when the running cost crosses $5. Combine with workspace rate limits in the Anthropic Console for a second layer. The two together prevent any single bad run from hurting.

10. CLAUDE.md under 200 lines (move everything else out)

Your CLAUDE.md gets re-loaded on every session start. A 600-line CLAUDE.md is a 600-line tax on every conversation.

The pattern — @mattpocockuk (862 likes, 52K views) and @techNmak (4,503 likes, 322K views) both recommend:

CLAUDE.md: under 200 lines. Commands, file structure, behavioral rules, key conventions.
CODE_STANDARDS.md: detailed style guide, loaded only during code review via a review-only agent.
ARCHITECTURE.md: referenced by name, loaded only when the task actually needs it.

“Spend tokens at review time, not implementation time. Your CLAUDE.md shouldn’t teach the language, just the conventions.” — @mattpocockuk

Pair with our CLAUDE.md Reviewer tool to find the vague sections first.

Three Paths — Decision Tree

Once the fixes above are applied, the right setup depends on one question: what is your goal?

If your goal is $0/month → Hobbyist path

OpenClaw + Ollama + Qwen 3.6 or GLM-5.1 locally. No cloud API. Electricity only. Best for: learning, personal automations, privacy-sensitive work. Read our best local models guide.

🖥️ RUN IT LOCALLY FOR ~$0/MONTH

The biggest cost cut is moving to a local model on your own hardware. A 24 GB Mac is the practical minimum; 48 GB+ runs the larger models comfortably.

MINMacBook Pro M-series 24 GB ↗ 48GB+Premium Mac ↗

If your goal is reliable daily production → Solo path ($15-30/month)

Hetzner VPS ($4-8) + Claude Sonnet API with all 10 fixes applied ($6-13) + occasional GPT-4o-mini calls ($5-10). Best for: freelancers, indie builders, solo consultants. Apply ccusage + Caveman Mode + ENABLE_TOOL_SEARCH + session restarts at 200K before anything else.

If your goal is team at scale → Team path ($80-200/month)

DigitalOcean droplet ($12-48) + Claude Sonnet API per developer ($50-130) + monitoring stack ($15-20). The fixes above matter 10x more at team scale — an unoptimized 10-person team burns $3,000-10,000/month; optimized, the same team lands at $80-200.

🎓 WANT ALL TEN FIXES APPLIED FOR YOU?

If the bill is bleeding and you would rather not work through ten settings changes alone, we apply the whole checklist live and set up cost monitoring in one remote 1:1 or team session. See training and cost-audit options →

Quick Reference Table

Fix	Effort	Typical Savings
0. Install ccusage	30 sec	20-25% (behavior change)
1. Avoid `claude -p` on subscription	0 sec	Prevents catastrophic bills
2. Caveman Mode in CLAUDE.md	2 min	$100-140/mo
3. ENABLE_TOOL_SEARCH	1 min	$50-100/mo
4. Restart sessions at 200K	Behavior	Up to 80% monthly drop
5. Advisor pattern (Opus + Sonnet)	5 min	11.9% (Anthropic bench)
6. Audit MCP servers + .claudeignore	5 min	10-20% context
7. compactThreshold + subagentModel	2 min	30-50% stacked
8. Qwen 3.6-Plus on OpenRouter free	3 min	Up to $200/mo replaced with $0
9. `--max-budget-usd` per run	0 sec	Circuit breaker
10. CLAUDE.md under 200 lines	15 min	10-15% per session

Still got a surprise bill? Book a Call and we will audit your setup and apply the top three fixes live.

Last updated: April 2026.

Get guides like this in your inbox every Wednesday.

No spam. Unsubscribe anytime.

You'll probably need this again.

Press Cmd+D (Mac) or Ctrl+D (Windows) to bookmark this page.

Need OpenClaw fixed live?

Remote rescue sessions for gateway, auth, tunnel, VPS, and model access problems.

See Rescue Session

Next useful step

Get help with the setup CloudYeti session for local AI, AWS, auth, VPS, and model routing. → Turn notes into docs Use MarkdownMe's DITA/XML tools for structured setup documentation. →

Loop Engineering in 5 Minutes

OpenClaw Costs: How I Went From $1,600/mo to $180/mo (10 Fixes That Actually Worked)

Step 0: Measure Before You Optimize (ccusage)

1. The `claude -p` trap (up to $1,800 in one night)

2. Caveman Mode — 61-75% output token reduction

3. ENABLE_TOOL_SEARCH — the 45K-token tool-schema tax

4. Restart sessions at 200K tokens (never go past 300K)

5. The Advisor pattern (Opus + Sonnet/Haiku split)

6. Audit your MCP servers (2-5K tokens each, per turn)

7. Advanced settings.json (compactThreshold + subagentModel)

8. Qwen 3.6-Plus free on OpenRouter ($200/mo → $0 for a lot of tasks)

9. Hard-cap invocations with `--max-budget-usd`

10. CLAUDE.md under 200 lines (move everything else out)

Three Paths — Decision Tree

If your goal is $0/month → Hobbyist path

If your goal is reliable daily production → Solo path ($15-30/month)

If your goal is team at scale → Team path ($80-200/month)

Quick Reference Table

Need OpenClaw fixed live?

Read next

Loop Engineering in 5 Minutes

Step 0: Measure Before You Optimize (ccusage)

1. The claude -p trap (up to $1,800 in one night)

2. Caveman Mode — 61-75% output token reduction

3. ENABLE_TOOL_SEARCH — the 45K-token tool-schema tax

4. Restart sessions at 200K tokens (never go past 300K)

5. The Advisor pattern (Opus + Sonnet/Haiku split)

6. Audit your MCP servers (2-5K tokens each, per turn)

7. Advanced settings.json (compactThreshold + subagentModel)

8. Qwen 3.6-Plus free on OpenRouter ($200/mo → $0 for a lot of tasks)

9. Hard-cap invocations with --max-budget-usd

10. CLAUDE.md under 200 lines (move everything else out)

Three Paths — Decision Tree

If your goal is $0/month → Hobbyist path

If your goal is reliable daily production → Solo path ($15-30/month)

If your goal is team at scale → Team path ($80-200/month)

Quick Reference Table

Need OpenClaw fixed live?

Read next

1. The `claude -p` trap (up to $1,800 in one night)

9. Hard-cap invocations with `--max-budget-usd`