Prompt caching — how I calculate Opus 4.7 costs (and why the bill isn't scary)
Opus 4.7 looks expensive. Five minutes of prompt caching cuts cost by 90%. Showing how it works, how to measure, and which cadence patterns actually move the bill.

Opus 4.7 has a higher per-token price than Sonnet 4.6. Looking at the price sheet, twice as much. Looking at the bill after a month, the gap is small. The secret is prompt caching, the mechanism that changes long-session economics.
How the cache works
The Anthropic API has supported prompt caching for a while. In Claude Code it's on by default. Cache TTL = 5 minutes.
When you send a prompt:
- The system checks if the initial part of context (system prompt + tools + conversation history) was used recently
- If yes (cache hit) → you pay 10% of standard price for cached tokens
- If no (cache miss) → you pay 125% (cache write cost)
So the first message after a pause is more expensive, every following one in the 5-minute window, cheap.
Concrete bill
A 30-minute session with 8 turns:
| Turn | Time since last | Cache state | Prompt cost |
|---|---|---|---|
| 1 | start | miss | 125% (write) |
| 2 | 30s | hit | 10% |
| 3 | 1min | hit | 10% |
| 4 | 2min | hit | 10% |
| 5 | 6min | miss (TTL) | 125% (write) |
| 6 | 1min | hit | 10% |
| 7 | 2min | hit | 10% |
| 8 | 1min | hit | 10% |
2 cache writes (expensive) + 6 cache hits (cheap). Average prompt cost ~30% of standard price. Real data from my billing: a typical 30-minute session is $0.40-0.60 instead of expected $1.50-2.00.
What breaks the cache
Three most common surprise-cost causes:
1. Idle pause > 5 min. Agent waits, I go for coffee, come back, cache dead. Cold start = full cost.
2. System prompt edit. Changing CLAUDE.md mid-session or /model flip → entire cache invalidated.
3. Long context > 200k. Above 200k tokens the cache behaves differently (sliding window) and some cadence patterns don't catch. Practical consequence: on big tasks I plan context up front.
Cache-hold strategy
What I do to keep the cache working for me:
1. Decouple idle work. If I know I have to step away for 10 minutes, I end the session, start a new one when I return. I don't leave Claude "waiting". The cold start costs once, but everything after is cheap.
2. Batch decisions. Instead of 5 prompts at 1-minute intervals, I formulate 1 longer prompt with 5 questions. Fewer prompts = fewer cache writes.
3. I skip the small stuff. Tiny acknowledgments ("ok", "yes", "next") cause a cache hit. They don't cost much, but they don't push the TTL either, TTL counts from the last cache write, not the last prompt.
4. Plan mode for big tasks. Plan mode builds full context in one phase. After it's done I have a fresh cache for the implementation phase.
Measuring in practice
The Anthropic console shows per-call cost. I check weekly:
Sessions > 10 min: avg $0.45 per session
Sessions 5-10 min: avg $0.30 per session
Sessions < 5 min: avg $0.20 per sessionShorter sessions are cheaper nominally, but the cache write dominates. For a 30-minute session the per-minute cost drops noticeably.
Anti-pattern: "more often, shorter"
Intuition: shorter sessions are cheaper, so make them shorter. NOT true. Shorter sessions pay cache write proportionally more.
Better: one 30-minute session than 6 × 5-minute ones in an hour.
When caching doesn't help
Three scenarios:
1. Batch processing. You generate 100 independent reports. Each is fresh context → cache doesn't help. Choice: Sonnet 4.6 (cheaper nominally).
2. Very long idle. A cron job every 6 hours → 5-min TTL is irrelevant. Every call is fresh cost. Choice: Sonnet 4.6 or Haiku.
3. Hard cap on costs. If you bill per-call (queue with a budget), cache surprises are infeasible. Sonnet 4.6.
The rule I apply
| Use case | Model |
|---|---|
| Active session 5-60 min | Opus 4.7 |
| Cron every < 5 min | Opus 4.7 (cache hits usually) |
| Cron every > 5 min | Sonnet 4.6 |
| Batch processing | Sonnet 4.6 |
| Quick ad-hoc query | Opus 4.7 with /fast (Opus 4.6 fast mode) |
| Background agent with > 5 min idle | Sonnet 4.6 |
Per-token price is not the same as per-task price. With prompt caching Opus 4.7 is cheaper than it looks. Measure, optimize cadence, not models.