Prompt caching — how I calculate Opus 4.7 costs (and why the bill isn't scary)

Opus 4.7 has a higher per-token price than Sonnet 4.6. Looking at the price sheet, twice as much. Looking at the bill after a month, the gap is small. The secret is prompt caching, the mechanism that changes long-session economics.

How the cache works

The Anthropic API has supported prompt caching for a while. In Claude Code it's on by default. Cache TTL = 5 minutes.

When you send a prompt:

The system checks if the initial part of context (system prompt + tools + conversation history) was used recently
If yes (cache hit) → you pay 10% of standard price for cached tokens
If no (cache miss) → you pay 125% (cache write cost)

So the first message after a pause is more expensive, every following one in the 5-minute window, cheap.

Concrete bill

A 30-minute session with 8 turns:

Turn	Time since last	Cache state	Prompt cost
1	start	miss	125% (write)
2	30s	hit	10%
3	1min	hit	10%
4	2min	hit	10%
5	6min	miss (TTL)	125% (write)
6	1min	hit	10%
7	2min	hit	10%
8	1min	hit	10%

2 cache writes (expensive) + 6 cache hits (cheap). Average prompt cost ~30% of standard price. Real data from my billing: a typical 30-minute session is $0.40-0.60 instead of expected $1.50-2.00.

What breaks the cache

Three most common surprise-cost causes:

1. Idle pause > 5 min. Agent waits, I go for coffee, come back, cache dead. Cold start = full cost.

2. System prompt edit. Changing CLAUDE.md mid-session or /model flip → entire cache invalidated.

3. Long context > 200k. Above 200k tokens the cache behaves differently (sliding window) and some cadence patterns don't catch. Practical consequence: on big tasks I plan context up front.

Cache-hold strategy

What I do to keep the cache working for me:

1. Decouple idle work. If I know I have to step away for 10 minutes, I end the session, start a new one when I return. I don't leave Claude "waiting". The cold start costs once, but everything after is cheap.

2. Batch decisions. Instead of 5 prompts at 1-minute intervals, I formulate 1 longer prompt with 5 questions. Fewer prompts = fewer cache writes.

3. I skip the small stuff. Tiny acknowledgments ("ok", "yes", "next") cause a cache hit. They don't cost much, but they don't push the TTL either, TTL counts from the last cache write, not the last prompt.

4. Plan mode for big tasks. Plan mode builds full context in one phase. After it's done I have a fresh cache for the implementation phase.

Measuring in practice

The Anthropic console shows per-call cost. I check weekly:

Sessions > 10 min: avg $0.45 per session
Sessions 5-10 min: avg $0.30 per session
Sessions < 5 min: avg $0.20 per session

Shorter sessions are cheaper nominally, but the cache write dominates. For a 30-minute session the per-minute cost drops noticeably.

Anti-pattern: "more often, shorter"

Intuition: shorter sessions are cheaper, so make them shorter. NOT true. Shorter sessions pay cache write proportionally more.

Better: one 30-minute session than 6 × 5-minute ones in an hour.

When caching doesn't help

Three scenarios:

1. Batch processing. You generate 100 independent reports. Each is fresh context → cache doesn't help. Choice: Sonnet 4.6 (cheaper nominally).

2. Very long idle. A cron job every 6 hours → 5-min TTL is irrelevant. Every call is fresh cost. Choice: Sonnet 4.6 or Haiku.

3. Hard cap on costs. If you bill per-call (queue with a budget), cache surprises are infeasible. Sonnet 4.6.

The rule I apply

Use case	Model
Active session 5-60 min	Opus 4.7
Cron every < 5 min	Opus 4.7 (cache hits usually)
Cron every > 5 min	Sonnet 4.6
Batch processing	Sonnet 4.6
Quick ad-hoc query	Opus 4.7 with `/fast` (Opus 4.6 fast mode)
Background agent with > 5 min idle	Sonnet 4.6

Per-token price is not the same as per-task price. With prompt caching Opus 4.7 is cheaper than it looks. Measure, optimize cadence, not models.