Skip to main content
TokenCost logoTokenCost
IndustryMarch 24, 2026·7 min read

Anthropic drops the 2x long-context surcharge: what Claude now costs at 1M tokens

On March 13, Anthropic removed the pricing penalty that doubled your bill when prompts crossed 200K tokens. Claude Sonnet 4.6 and Opus 4.6 now charge flat rates up to 1M tokens. GPT-5.4 and Gemini 3.1 Pro still don't.

Anthropic 1M context window at flat pricing - Claude Sonnet 4.6 and Opus 4.6

Image source: Anthropic

TL;DR

  • -What changed: Anthropic removed the 2x input / 1.5x output surcharge that applied when prompts crossed 200K tokens. Effective March 13, 2026.
  • -The part most coverage missed: The old system repriced the entire request, not just the tokens above 200K. Crossing the threshold meant token 1 through token 500,000 all billed at 2x.
  • -New flat pricing: Sonnet 4.6 stays at $3/M input, Opus 4.6 at $5/M input, regardless of context length up to 1M tokens.
  • -Competitors: GPT-5.4 has a 272K cliff that doubles input costs. Gemini 3.1 Pro has a 200K cliff. Claude is now the only major provider with flat 1M-context pricing on flagship models.
  • -With prompt caching: A 1M token knowledge base with 90% cache hits costs around $0.57 per request on Sonnet 4.6 - down from $3.00 without caching.

How the old pricing worked

For months, Anthropic charged a premium for extended-context requests. When your prompt crossed 200K input tokens, the rate jumped: 2x on input tokens, 1.5x on output tokens. Sonnet 4.6 went from $3/M to $6/M input. Opus 4.6 went from $5/M to $10/M input.

The part most coverage missed: it wasn't a marginal surcharge on overflow tokens. The premium applied to the entire request. Send 201K tokens and every single one - including the first 200K - billed at the inflated rate. A 700K-token legal document review on Sonnet 4.6 cost $4.20 in input alone, not the $2.10 you'd expect from the advertised $3/M rate.

Accessing 1M context also required passing a context-1m-2025-08-07 beta header. Developers running long-context workloads were paying the premium silently in many cases. That header is now irrelevant - the pricing drops automatically and the 1M window is generally available.

Current Claude pricing at all context lengths

As of March 13, Opus 4.6 and Sonnet 4.6 have flat pricing across their full 1M token context window. Haiku 4.5 was not part of this change - its context window is 200K, so the long-context question doesn't apply to it.

ModelInput / 1MOutput / 1MCache read / 1MBatch input / 1MContext
Claude Opus 4.6$5.00$25.00$0.50$2.501M
Claude Sonnet 4.6$3.00$15.00$0.30$1.501M
Claude Haiku 4.5$1.00$5.00$0.10$0.50200K

All rates flat across the full context window with no surcharge at any threshold. Data from Anthropic's pricing page, retrieved March 24, 2026.

GPT-5.4 and Gemini 3.1 Pro still have pricing cliffs

OpenAI and Google both still charge more for long-context requests, and both use the same mechanism Anthropic just abandoned: crossing a threshold reprices the entire request, not just the overflow.

ModelThresholdBelow / 1M inputAbove / 1M inputMultiplier
Claude Sonnet 4.6None$3.00$3.001x (flat)
Claude Opus 4.6None$5.00$5.001x (flat)
GPT-5.4272K$2.50$5.002x
Gemini 3.1 Pro200K$2.00$4.002x
DeepSeek V3.2None$0.28N/A (163K max)Flat

DeepSeek V3.2 is technically flat-priced, but its 163K context window means you can't test it above the old thresholds anyway. If your workload fits in 163K tokens, DeepSeek is still the cheapest option by a wide margin at $0.28/M. For anything requiring more context, the Anthropic change materially shifts the comparison.

What this actually costs per request

Three scenarios that regularly push past 200K tokens. These are per-request costs, not monthly totals.

Legal document review: 700K input + 5K output
12 contracts, roughly 847 pages
Claude Sonnet 4.6 (new, flat)$2.18
Claude Sonnet 4.6 (old, 2x surcharge)$4.31
Gemini 3.1 Pro (above 200K cliff)$2.89
GPT-5.4 (above 272K cliff)$3.61
Sonnet new: (0.7M x $3) + (0.005M x $15) = $2.10 + $0.075 = $2.18
Full codebase scan: 1M input + 20K output
Large monorepo, full analysis pass
Claude Sonnet 4.6 (new, flat)$3.30
Claude Sonnet 4.6 (old, 2x surcharge)$6.45
Gemini 3.1 Pro (above 200K cliff)$4.36
GPT-5.4 (above 272K cliff)$5.45
Sonnet new: (1.0M x $3) + (0.02M x $15) = $3.00 + $0.30 = $3.30
Research synthesis: 400K input + 10K output
Papers, earnings calls, or filings combined
Claude Sonnet 4.6 (new, flat)$1.35
Claude Sonnet 4.6 (old, 2x surcharge)$2.55
Gemini 3.1 Pro (above 200K cliff)$1.78
GPT-5.4 (above 272K cliff)$2.15
Sonnet new: (0.4M x $3) + (0.01M x $15) = $1.20 + $0.15 = $1.35

Run your own workload numbers with our cost calculator.

Prompt caching makes the math even better

The flat pricing and prompt caching stack well. If you're loading the same large document, codebase, or knowledge base across many requests, caching cuts the per-request cost significantly.

Example: 1M token knowledge base, repeated requests (Sonnet 4.6)
First request (cold, full 1M tokens)
$3.00 input
1,000K x $3.00/M
Follow-up (90% cached: 900K hit + 100K new)
$0.57 input
(900K x $0.30) + (100K x $3.00) = $0.27 + $0.30
Savings vs repeated cold loads
81% less
$0.57 vs $3.00 per request

Cache hits on Sonnet 4.6 are $0.30/M - 90% cheaper than the $3.00/M standard input rate. For an agent that reads the same large codebase repeatedly throughout the day, the difference between cold and cached loading is significant. The flat pricing means there's no additional surcharge stacked on top of those cache reads.

Who this actually affects

The change matters most for use cases that regularly push past 200K tokens. For typical chat or short-context tasks, nothing changes.

Meaningful cost reduction
  • Full codebase analysis (500K-1M tokens)
  • Legal document review across multiple filings
  • Agentic workflows with long working memory
  • Multi-document research synthesis
  • Large batch image/PDF analysis (up to 600 images now)
No real change
  • Chat and short-context tasks (under 200K)
  • Workloads where RAG already works well
  • Corpora larger than 1M tokens (still need retrieval)
  • High-throughput pipelines (DeepSeek still cheaper at scale)
  • Haiku 4.5 users (200K context, not part of this change)

One use case that makes more sense now: loading a full knowledge base directly instead of building a retrieval pipeline, when the corpus fits under 1M tokens and query volume is low enough that repeated caching stays cost-effective. RAG still wins for millions of queries per day or corpora that don't fit in context.

What this means in practice

For a while, "1M context" was a feature with an asterisk. The advertised rate was $3/M, but anything above 200K tokens actually cost $6/M - and the penalty hit all tokens in the request, not just the overflow. That's gone.

Claude Sonnet 4.6 at 1M tokens now costs $3 flat. GPT-5.4 at 1M tokens costs $5 at the cliff rate. Gemini 3.1 Pro at 1M tokens costs $4 at its cliff rate. For long-context workloads, the pricing comparison has shifted compared to what most developers assumed going into 2026.

See where every model sits on our pricing page, or plug in your specific workload with the cost calculator.

Sources