How much does Claude Sonnet 4.6 cost for 1M tokens?

As of March 13, 2026, Claude Sonnet 4.6 costs $3.00 per million input tokens at all context lengths up to 1M tokens. There is no longer a surcharge above 200K tokens. Output costs $15.00 per million tokens.

Does Claude still charge more for long context in 2026?

No. Anthropic removed the 2x long-context surcharge on March 13, 2026. Claude Opus 4.6 and Sonnet 4.6 now charge the same flat rate at 1K tokens or 1M tokens. Haiku 4.5 has a 200K context window and was not part of this change.

Does GPT-5.4 charge more for long context?

Yes. GPT-5.4 has a pricing threshold at 272K tokens. Below it, input costs $2.50 per million tokens. Above it, the entire request is repriced at $5.00 per million tokens - double the standard rate. Output also increases from $15.00 to approximately $22.50 per million.

Which LLM is cheapest for 1M token context in 2026?

Claude Sonnet 4.6 at $3.00 per million input tokens is the cheapest major model with 1M token context as of March 2026. Gemini 3.1 Pro charges $4.00/M above its 200K threshold, and GPT-5.4 charges $5.00/M above its 272K threshold. DeepSeek V3.2 is cheaper per token but maxes out at 163K context.

IndustryMarch 24, 2026·7 min read

Anthropic drops the 2x long-context surcharge: what Claude now costs at 1M tokens

On March 13, Anthropic removed the pricing penalty that doubled your bill when prompts crossed 200K tokens. Claude Sonnet 4.6 and Opus 4.6 now charge flat rates up to 1M tokens. GPT-5.4 and Gemini 3.1 Pro still don't.

Anthropic 1M context window at flat pricing - Claude Sonnet 4.6 and Opus 4.6

Image source: Anthropic

TL;DR

-What changed: Anthropic removed the 2x input / 1.5x output surcharge that applied when prompts crossed 200K tokens. Effective March 13, 2026.
-The part most coverage missed: The old system repriced the entire request, not just the tokens above 200K. Crossing the threshold meant token 1 through token 500,000 all billed at 2x.
-New flat pricing: Sonnet 4.6 stays at $3/M input, Opus 4.6 at $5/M input, regardless of context length up to 1M tokens.
-Competitors: GPT-5.4 has a 272K cliff that doubles input costs. Gemini 3.1 Pro has a 200K cliff. Claude is now the only major provider with flat 1M-context pricing on flagship models.
-With prompt caching: A 1M token knowledge base with 90% cache hits costs around $0.57 per request on Sonnet 4.6 - down from $3.00 without caching.

How the old pricing worked

For months, Anthropic charged a premium for extended-context requests. When your prompt crossed 200K input tokens, the rate jumped: 2x on input tokens, 1.5x on output tokens. Sonnet 4.6 went from $3/M to $6/M input. Opus 4.6 went from $5/M to $10/M input.

The part most coverage missed: it wasn't a marginal surcharge on overflow tokens. The premium applied to the entire request. Send 201K tokens and every single one - including the first 200K - billed at the inflated rate. A 700K-token legal document review on Sonnet 4.6 cost $4.20 in input alone, not the $2.10 you'd expect from the advertised $3/M rate.

Accessing 1M context also required passing a context-1m-2025-08-07 beta header. Developers running long-context workloads were paying the premium silently in many cases. That header is now irrelevant - the pricing drops automatically and the 1M window is generally available.

Current Claude pricing at all context lengths

As of March 13, Opus 4.6 and Sonnet 4.6 have flat pricing across their full 1M token context window. Haiku 4.5 was not part of this change - its context window is 200K, so the long-context question doesn't apply to it.

Model	Input / 1M	Output / 1M	Cache read / 1M	Batch input / 1M	Context
Claude Opus 4.6	$5.00	$25.00	$0.50	$2.50	1M
Claude Sonnet 4.6	$3.00	$15.00	$0.30	$1.50	1M
Claude Haiku 4.5	$1.00	$5.00	$0.10	$0.50	200K

All rates flat across the full context window with no surcharge at any threshold. Data from Anthropic's pricing page, retrieved March 24, 2026.

GPT-5.4 and Gemini 3.1 Pro still have pricing cliffs

OpenAI and Google both still charge more for long-context requests, and both use the same mechanism Anthropic just abandoned: crossing a threshold reprices the entire request, not just the overflow.

Model	Threshold	Below / 1M input	Above / 1M input	Multiplier
Claude Sonnet 4.6	None	$3.00	$3.00	1x (flat)
Claude Opus 4.6	None	$5.00	$5.00	1x (flat)
GPT-5.4	272K	$2.50	$5.00	2x
Gemini 3.1 Pro	200K	$2.00	$4.00	2x
DeepSeek V3.2	None	$0.28	N/A (163K max)	Flat

DeepSeek V3.2 is technically flat-priced, but its 163K context window means you can't test it above the old thresholds anyway. If your workload fits in 163K tokens, DeepSeek is still the cheapest option by a wide margin at $0.28/M. For anything requiring more context, the Anthropic change materially shifts the comparison.

What this actually costs per request

Three scenarios that regularly push past 200K tokens. These are per-request costs, not monthly totals.

Legal document review: 700K input + 5K output

12 contracts, roughly 847 pages

Claude Sonnet 4.6 (new, flat)$2.18

Claude Sonnet 4.6 (old, 2x surcharge)$4.31

Gemini 3.1 Pro (above 200K cliff)$2.89

GPT-5.4 (above 272K cliff)$3.61

Sonnet new: (0.7M x $3) + (0.005M x $15) = $2.10 + $0.075 = $2.18

Full codebase scan: 1M input + 20K output

Large monorepo, full analysis pass

Claude Sonnet 4.6 (new, flat)$3.30

Claude Sonnet 4.6 (old, 2x surcharge)$6.45

Gemini 3.1 Pro (above 200K cliff)$4.36

GPT-5.4 (above 272K cliff)$5.45

Sonnet new: (1.0M x $3) + (0.02M x $15) = $3.00 + $0.30 = $3.30

Research synthesis: 400K input + 10K output

Papers, earnings calls, or filings combined

Claude Sonnet 4.6 (new, flat)$1.35

Claude Sonnet 4.6 (old, 2x surcharge)$2.55

Gemini 3.1 Pro (above 200K cliff)$1.78

GPT-5.4 (above 272K cliff)$2.15

Sonnet new: (0.4M x $3) + (0.01M x $15) = $1.20 + $0.15 = $1.35

Run your own workload numbers with our cost calculator.

Prompt caching makes the math even better

The flat pricing and prompt caching stack well. If you're loading the same large document, codebase, or knowledge base across many requests, caching cuts the per-request cost significantly.

Example: 1M token knowledge base, repeated requests (Sonnet 4.6)

First request (cold, full 1M tokens)

$3.00 input

1,000K x $3.00/M

Follow-up (90% cached: 900K hit + 100K new)

$0.57 input

(900K x $0.30) + (100K x $3.00) = $0.27 + $0.30

Savings vs repeated cold loads

81% less

$0.57 vs $3.00 per request

Cache hits on Sonnet 4.6 are $0.30/M - 90% cheaper than the $3.00/M standard input rate. For an agent that reads the same large codebase repeatedly throughout the day, the difference between cold and cached loading is significant. The flat pricing means there's no additional surcharge stacked on top of those cache reads.

Who this actually affects

The change matters most for use cases that regularly push past 200K tokens. For typical chat or short-context tasks, nothing changes.

Meaningful cost reduction

Full codebase analysis (500K-1M tokens)
Legal document review across multiple filings
Agentic workflows with long working memory
Multi-document research synthesis
Large batch image/PDF analysis (up to 600 images now)

No real change

Chat and short-context tasks (under 200K)
Workloads where RAG already works well
Corpora larger than 1M tokens (still need retrieval)
High-throughput pipelines (DeepSeek still cheaper at scale)
Haiku 4.5 users (200K context, not part of this change)

One use case that makes more sense now: loading a full knowledge base directly instead of building a retrieval pipeline, when the corpus fits under 1M tokens and query volume is low enough that repeated caching stays cost-effective. RAG still wins for millions of queries per day or corpora that don't fit in context.

What this means in practice

For a while, "1M context" was a feature with an asterisk. The advertised rate was $3/M, but anything above 200K tokens actually cost $6/M - and the penalty hit all tokens in the request, not just the overflow. That's gone.

Claude Sonnet 4.6 at 1M tokens now costs $3 flat. GPT-5.4 at 1M tokens costs $5 at the cliff rate. Gemini 3.1 Pro at 1M tokens costs $4 at its cliff rate. For long-context workloads, the pricing comparison has shifted compared to what most developers assumed going into 2026.

See where every model sits on our pricing page, or plug in your specific workload with the cost calculator.

Sources

Compare All Model Pricing Calculate Your API Costs