Does the Opus 4.7 tokenizer change make it more expensive than Opus 4.6?

For code and structured data (JSON, XML), yes - the new tokenizer can use up to 35% more tokens for the same content, raising the effective cost per request even though the per-token rate is unchanged. For standard English prose, the impact is minimal. Anthropic's docs note that 1M Opus 4.7 tokens covers roughly 555k words, vs 750k words on Opus 4.6.

What is Claude Opus 4.7's model ID for the API?

The Anthropic API model ID is claude-opus-4-7. On Google Vertex AI it is claude-opus-4-7 or claude-opus-4-7@20260416. On Microsoft Foundry it is claude-opus-4-7@20260416.

Model ReleaseApril 17, 2026·8 min read

Claude Opus 4.7: $5 per million tokens - and what that actually means now

Released April 16. The per-token rate is unchanged from Opus 4.6. But the new tokenizer can use up to 35% more tokens for the same code or JSON - so your bill depends on what you send, not just how much.

Claude Opus 4.7 pricing and tokenizer analysis - cost comparison for API developers

Image source: Anthropic

$5.00/$25.00 per million tokens - same as Opus 4.6. Batch at $2.50/$12.50. Cache hits come in at $0.50/MTok
New tokenizer: up to 35% more tokens for the same code or JSON. English prose barely changes. This is a token-count increase, not a rate hike
budget_tokens is gone - adaptive thinking replaced extended thinking. Thinking tokens still bill at $25/MTok, even the ones you never see
87.6% on SWE-bench Verified, the highest score on that benchmark publicly reported right now
1M context at standard rates, 128k max output (300k via Batch API beta)

The rate card

Anthropic kept the price per token flat. Every tier is identical to Opus 4.6.

Tier	Input / 1M tokens	Output / 1M tokens
Standard	$5.00	$25.00
Batch API (50% off)	$2.50	$12.50
Cache hit (read)	$0.50	N/A
Cache write, 5-min TTL	$6.25	N/A
Cache write, 1-hr TTL	$10.00	N/A

Batch and cache discounts stack. Batch + cache hit = 5% of standard input price. Source: Anthropic pricing docs

The tokenizer change: why the rate card is not the whole story

Anthropic's pricing page includes a small note that matters more than it looks: Opus 4.7 uses a new tokenizer, and that tokenizer may produce up to 35% more tokens for the same input text. The per-token price is flat. The token count is not.

The context window section makes this concrete. On Opus 4.6, one million tokens covered roughly 750,000 words. On Opus 4.7, one million tokens covers roughly 555,000 words. That is a 35% reduction in how much content fits in the same token budget - or equivalently, the same document now costs 35% more tokens to process.

Code and structured data are where you see the full effect. English prose is largely unchanged - the new tokenizer handles standard text similarly to the old one. If your system prompt is a Python file and your user messages are JSON payloads, you are in the 1.3x-1.35x range. Conversational support in English will barely notice the difference.

Content type	Token multiplier	Effective input cost
English prose	~1.0x	~$5.00/MTok
Mixed code and text	~1.15-1.25x	~$5.75-6.25/MTok
Python / JavaScript code	~1.2-1.3x	~$6.00-6.50/MTok
JSON / XML / YAML	up to 1.35x	up to $6.75/MTok

Effective costs based on Anthropic's documented range of 1x to 1.35x token multiplier. Actual impact varies by specific content.

A concrete example. Take a pipeline that processes 1 million words of Python code per day as input. On Opus 4.6, that is roughly 1.33M tokens at $5/MTok, or about $6.65. On Opus 4.7 with a 1.3x multiplier, the same content is 1.73M tokens, or $8.65. That extra $2/day is $730/year - not catastrophic, but real, and entirely invisible from the rate card.

The tokenizer change is also why the model improved. A richer vocabulary lets the model reason at a higher level of abstraction. But it is worth auditing your actual token counts on Opus 4.7 before assuming your cost will stay flat.

Adaptive thinking: the billing change you might miss

Extended thinking is gone. If you pass thinking: {type: "enabled", budget_tokens: N} to Opus 4.7, the API returns a 400 error. The replacement is adaptive thinking, enabled with thinking: {type: "adaptive"}.

The billing model is the same: thinking tokens cost $25/MTok (the output rate). But there are two things to watch.

First, you are billed for thinking tokens even when display: "omitted" hides them from your response. Omitting display does not skip the computation - it just stops the tokens from reaching you. The cost is the same either way.

Second, Opus 4.7 adds a new effort level called xhigh - available on this model only. There are now five levels: low, medium, high (the default when adaptive thinking is on), xhigh, and max. Using xhigh or max with complex problems generates substantially more thinking tokens. On a problem that uses 50,000 thinking tokens at default effort, xhigh might reach 80,000-100,000. That is an extra $0.75-1.25 per request at $25/MTok.

Effort level	Behavior	Availability
`low`	Minimizes thinking, may skip entirely	All Claude 4 models
`medium`	Moderate thinking, skips for simple queries	All Claude 4 models
`high`	Always thinks (default when adaptive enabled)	All Claude 4 models
`xhigh`	Deep exploration, always thinks	Opus 4.7 only
`max`	No depth constraints	All Claude 4 models

If you were using extended thinking with a fixed budget_tokens cap to control costs, adaptive thinking requires a different approach. The model decides how much to think based on the effort level. For cost-sensitive workloads, start at the default high and only use xhigh for tasks that genuinely need the extra depth.

Caching: the numbers that matter

Cache hits cost $0.50/MTok - 10% of the standard input rate. If your workload has a large static system prompt, caching brings the effective input cost far below the headline $5/MTok.

Example: a 200,000-token system prompt sent 500 times a day. Without caching: 100M tokens/day at $5/MTok = $500. With caching: first request writes at $6.25/MTok ($1.25), then 499 hits at $0.50/MTok ($49.90 for the cached tokens) = about $51/day total. That is a 90% reduction on the cached portion.

There is one interaction to watch: switching between adaptive thinking modes (enabled vs. disabled) breaks cache breakpoints for conversation messages. System prompts and tool definitions remain cached regardless. Keep your thinking mode consistent within a session if caching economics matter.

Batch and cache discounts stack. Batch + cache hit = $2.50 x 10% = $0.25/MTok effective input cost. At that level, the tokenizer overhead is nearly irrelevant.

Benchmark scores

Opus 4.7 leads on coding and agentic tasks. It trails on browser-based research.

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified	87.6%	80.8%	N/A	80.6%
SWE-bench Pro	64.3%	53.4%	57.7%	54.2%
GPQA Diamond	94.2%	91.3%	94.4%*	94.3%
Terminal-Bench 2.0	69.4%	65.4%	75.1%	68.5%
MCP-Atlas (tool use)	77.3%	75.8%	68.1%	73.9%
OSWorld-Verified	78.0%	72.7%	75.0%	N/A
BrowseComp	79.3%	83.7%	89.3%*	85.9%
CharXiv (visual, tools)	91.0%	77.4%	N/A	N/A

* GPT-5.4 Pro tier. Highlighted scores are category leaders. Sources: LLM Stats, BenchLM

The coding numbers are the argument for paying the Opus premium. SWE-bench Verified at 87.6% is 7 points ahead of Opus 4.6 and 7 points ahead of Gemini 3.1 Pro. SWE-bench Pro - which uses harder, real production repositories - shows the same gap: 64.3% vs GPT-5.4's 57.7% and Gemini 3.1 Pro's 54.2%. For code agents, that gap is meaningful.

BrowseComp is worth looking at. Opus 4.7 scores 79.3% on web research while GPT-5.4 Pro hits 89.3% and Gemini 3.1 Pro reaches 85.9%. For browser-based research workloads, GPT-5.4 at $2.50/MTok beats Opus 4.7 on that benchmark at half the input cost.

Cost comparison

Model	Input / 1M	Output / 1M	Cache hit	Best for
Claude Opus 4.7	$5.00	$25.00	$0.50	Coding agents, complex tool use
GPT-5.4	$2.50	$15.00	$1.25	Web research, broad reasoning
Gemini 3.1 Pro	$2.00	$12.00	$0.20	Cost-efficient frontier tasks
Claude Sonnet 4.6	$3.00	$15.00	$0.30	Anthropic quality at lower cost

At standard pricing, Opus 4.7 costs 2x more on input than GPT-5.4 and 2.5x more than Gemini 3.1 Pro. Add the tokenizer overhead on code-heavy workloads and the effective gap reaches 2.7x vs GPT-5.4. The premium is real.

For workloads that do not specifically need top-tier coding benchmarks, Sonnet 4.6 at $3/$15 is worth testing first. Same 1M context, same Anthropic model family, 40% less per input token.

Other changes from Opus 4.6

Sampling parameters restricted

You can no longer set both temperature and top_p. Pick one - setting both returns a 400 error. Temperature and top_k also cannot both be set to non-default values.

Thinking content hidden by default

On Opus 4.6, thinking output defaulted to 'summarized'. On 4.7 it defaults to 'omitted'. Add display: 'summarized' explicitly if you depend on receiving thinking traces.

Higher-resolution vision

Max image resolution raised from 1.15MP to 3.75MP. Coordinates in computer use tasks are now 1:1 with actual pixels - no scaling needed.

Knowledge cutoff moved up

Reliable knowledge through January 2026, vs May 2025 on Opus 4.6. Eight months of additional training data, which matters for any workload involving recent models, tools, or events.

Task budgets (beta)

New advisory token limit for agentic loops via the task-budgets-2026-03-13 beta header. The model is informed of the budget, not hard-stopped. Batch API supports 300k output via output-300k-2026-03-24 beta header.

Quick reference

Model ID

claude-opus-4-7

Released

April 16, 2026

Input price

$5.00 / 1M

Output price

$25.00 / 1M

Context window

1M tokens

Max output

128K (300K batch)

Batch input

$2.50 / 1M

Cache hit

$0.50 / 1M

Tokenizer impact

Up to 1.35x on code

Compare Opus 4.7 vs other models Calculate your Opus 4.7 cost

FAQ

How much does Claude Opus 4.7 cost per million tokens?

Standard pricing is $5.00 input / $25.00 output per million tokens. Batch API is $2.50/$12.50. Cache hits cost $0.50/MTok. Cache writes are $6.25/MTok (5-min TTL) or $10.00/MTok (1-hr TTL). All rates are identical to Opus 4.6.

Does the new tokenizer make Opus 4.7 more expensive than Opus 4.6?

For code and structured data (JSON, XML), yes - the tokenizer can use up to 35% more tokens for the same content. For standard English prose, the impact is minimal. Anthropic documents that 1M Opus 4.7 tokens covers roughly 555k words, vs 750k words on Opus 4.6.

What replaced extended thinking in Claude Opus 4.7?

Adaptive thinking, enabled via thinking: {type: 'adaptive'}. The old budget_tokens parameter returns a 400 error. Thinking tokens are billed at the $25/MTok output rate and are counted whether or not you display them.

What is Claude Opus 4.7's context window?

1 million tokens input at standard per-token pricing - no long-context surcharge. Maximum output is 128k tokens synchronously. Batch API supports up to 300k output tokens via the output-300k-2026-03-24 beta header.

How does Claude Opus 4.7 compare to GPT-5.4 on benchmarks?

Opus 4.7 leads on coding: SWE-bench Verified 87.6%, SWE-bench Pro 64.3% vs GPT-5.4's 57.7%. GPQA Diamond near-tied at 94.2% vs 94.4% (GPT-5.4 Pro). GPT-5.4 leads on BrowseComp (89.3% Pro vs 79.3%) and Terminal-Bench 2.0 (75.1% vs 69.4%). Opus 4.7 costs 2x more on input.

What is Claude Opus 4.7's API model ID?

claude-opus-4-7 on the Anthropic API. On Google Vertex AI: claude-opus-4-7 or claude-opus-4-7@20260416. On Microsoft Foundry: claude-opus-4-7@20260416.