Anthropic drops the 2x long-context surcharge: what Claude now costs at 1M tokens
On March 13, Anthropic removed the pricing penalty that doubled your bill when prompts crossed 200K tokens. Claude Sonnet 4.6 and Opus 4.6 now charge flat rates up to 1M tokens. GPT-5.4 and Gemini 3.1 Pro still don't.

Image source: Anthropic
TL;DR
- -What changed: Anthropic removed the 2x input / 1.5x output surcharge that applied when prompts crossed 200K tokens. Effective March 13, 2026.
- -The part most coverage missed: The old system repriced the entire request, not just the tokens above 200K. Crossing the threshold meant token 1 through token 500,000 all billed at 2x.
- -New flat pricing: Sonnet 4.6 stays at $3/M input, Opus 4.6 at $5/M input, regardless of context length up to 1M tokens.
- -Competitors: GPT-5.4 has a 272K cliff that doubles input costs. Gemini 3.1 Pro has a 200K cliff. Claude is now the only major provider with flat 1M-context pricing on flagship models.
- -With prompt caching: A 1M token knowledge base with 90% cache hits costs around $0.57 per request on Sonnet 4.6 - down from $3.00 without caching.
How the old pricing worked
For months, Anthropic charged a premium for extended-context requests. When your prompt crossed 200K input tokens, the rate jumped: 2x on input tokens, 1.5x on output tokens. Sonnet 4.6 went from $3/M to $6/M input. Opus 4.6 went from $5/M to $10/M input.
The part most coverage missed: it wasn't a marginal surcharge on overflow tokens. The premium applied to the entire request. Send 201K tokens and every single one - including the first 200K - billed at the inflated rate. A 700K-token legal document review on Sonnet 4.6 cost $4.20 in input alone, not the $2.10 you'd expect from the advertised $3/M rate.
Accessing 1M context also required passing a context-1m-2025-08-07 beta header. Developers running long-context workloads were paying the premium silently in many cases. That header is now irrelevant - the pricing drops automatically and the 1M window is generally available.
Current Claude pricing at all context lengths
As of March 13, Opus 4.6 and Sonnet 4.6 have flat pricing across their full 1M token context window. Haiku 4.5 was not part of this change - its context window is 200K, so the long-context question doesn't apply to it.
| Model | Input / 1M | Output / 1M | Cache read / 1M | Batch input / 1M | Context |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 | $2.50 | 1M |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $1.50 | 1M |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | $0.50 | 200K |
All rates flat across the full context window with no surcharge at any threshold. Data from Anthropic's pricing page, retrieved March 24, 2026.
GPT-5.4 and Gemini 3.1 Pro still have pricing cliffs
OpenAI and Google both still charge more for long-context requests, and both use the same mechanism Anthropic just abandoned: crossing a threshold reprices the entire request, not just the overflow.
| Model | Threshold | Below / 1M input | Above / 1M input | Multiplier |
|---|---|---|---|---|
| Claude Sonnet 4.6 | None | $3.00 | $3.00 | 1x (flat) |
| Claude Opus 4.6 | None | $5.00 | $5.00 | 1x (flat) |
| GPT-5.4 | 272K | $2.50 | $5.00 | 2x |
| Gemini 3.1 Pro | 200K | $2.00 | $4.00 | 2x |
| DeepSeek V3.2 | None | $0.28 | N/A (163K max) | Flat |
DeepSeek V3.2 is technically flat-priced, but its 163K context window means you can't test it above the old thresholds anyway. If your workload fits in 163K tokens, DeepSeek is still the cheapest option by a wide margin at $0.28/M. For anything requiring more context, the Anthropic change materially shifts the comparison.
What this actually costs per request
Three scenarios that regularly push past 200K tokens. These are per-request costs, not monthly totals.
Run your own workload numbers with our cost calculator.
Prompt caching makes the math even better
The flat pricing and prompt caching stack well. If you're loading the same large document, codebase, or knowledge base across many requests, caching cuts the per-request cost significantly.
Cache hits on Sonnet 4.6 are $0.30/M - 90% cheaper than the $3.00/M standard input rate. For an agent that reads the same large codebase repeatedly throughout the day, the difference between cold and cached loading is significant. The flat pricing means there's no additional surcharge stacked on top of those cache reads.
Who this actually affects
The change matters most for use cases that regularly push past 200K tokens. For typical chat or short-context tasks, nothing changes.
- Full codebase analysis (500K-1M tokens)
- Legal document review across multiple filings
- Agentic workflows with long working memory
- Multi-document research synthesis
- Large batch image/PDF analysis (up to 600 images now)
- Chat and short-context tasks (under 200K)
- Workloads where RAG already works well
- Corpora larger than 1M tokens (still need retrieval)
- High-throughput pipelines (DeepSeek still cheaper at scale)
- Haiku 4.5 users (200K context, not part of this change)
One use case that makes more sense now: loading a full knowledge base directly instead of building a retrieval pipeline, when the corpus fits under 1M tokens and query volume is low enough that repeated caching stays cost-effective. RAG still wins for millions of queries per day or corpora that don't fit in context.
What this means in practice
For a while, "1M context" was a feature with an asterisk. The advertised rate was $3/M, but anything above 200K tokens actually cost $6/M - and the penalty hit all tokens in the request, not just the overflow. That's gone.
Claude Sonnet 4.6 at 1M tokens now costs $3 flat. GPT-5.4 at 1M tokens costs $5 at the cliff rate. Gemini 3.1 Pro at 1M tokens costs $4 at its cliff rate. For long-context workloads, the pricing comparison has shifted compared to what most developers assumed going into 2026.
See where every model sits on our pricing page, or plug in your specific workload with the cost calculator.