Claude Opus 4.7: $5 per million tokens - and what that actually means now
Released April 16. The per-token rate is unchanged from Opus 4.6. But the new tokenizer can use up to 35% more tokens for the same code or JSON - so your bill depends on what you send, not just how much.

Image source: Anthropic
- $5.00/$25.00 per million tokens - same as Opus 4.6. Batch at $2.50/$12.50. Cache hits come in at $0.50/MTok
- New tokenizer: up to 35% more tokens for the same code or JSON. English prose barely changes. This is a token-count increase, not a rate hike
- budget_tokens is gone - adaptive thinking replaced extended thinking. Thinking tokens still bill at $25/MTok, even the ones you never see
- 87.6% on SWE-bench Verified, the highest score on that benchmark publicly reported right now
- 1M context at standard rates, 128k max output (300k via Batch API beta)
The rate card
Anthropic kept the price per token flat. Every tier is identical to Opus 4.6.
| Tier | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Standard | $5.00 | $25.00 |
| Batch API (50% off) | $2.50 | $12.50 |
| Cache hit (read) | $0.50 | N/A |
| Cache write, 5-min TTL | $6.25 | N/A |
| Cache write, 1-hr TTL | $10.00 | N/A |
Batch and cache discounts stack. Batch + cache hit = 5% of standard input price. Source: Anthropic pricing docs
The tokenizer change: why the rate card is not the whole story
Anthropic's pricing page includes a small note that matters more than it looks: Opus 4.7 uses a new tokenizer, and that tokenizer may produce up to 35% more tokens for the same input text. The per-token price is flat. The token count is not.
The context window section makes this concrete. On Opus 4.6, one million tokens covered roughly 750,000 words. On Opus 4.7, one million tokens covers roughly 555,000 words. That is a 35% reduction in how much content fits in the same token budget - or equivalently, the same document now costs 35% more tokens to process.
Code and structured data are where you see the full effect. English prose is largely unchanged - the new tokenizer handles standard text similarly to the old one. If your system prompt is a Python file and your user messages are JSON payloads, you are in the 1.3x-1.35x range. Conversational support in English will barely notice the difference.
| Content type | Token multiplier | Effective input cost |
|---|---|---|
| English prose | ~1.0x | ~$5.00/MTok |
| Mixed code and text | ~1.15-1.25x | ~$5.75-6.25/MTok |
| Python / JavaScript code | ~1.2-1.3x | ~$6.00-6.50/MTok |
| JSON / XML / YAML | up to 1.35x | up to $6.75/MTok |
Effective costs based on Anthropic's documented range of 1x to 1.35x token multiplier. Actual impact varies by specific content.
A concrete example. Take a pipeline that processes 1 million words of Python code per day as input. On Opus 4.6, that is roughly 1.33M tokens at $5/MTok, or about $6.65. On Opus 4.7 with a 1.3x multiplier, the same content is 1.73M tokens, or $8.65. That extra $2/day is $730/year - not catastrophic, but real, and entirely invisible from the rate card.
The tokenizer change is also why the model improved. A richer vocabulary lets the model reason at a higher level of abstraction. But it is worth auditing your actual token counts on Opus 4.7 before assuming your cost will stay flat.
Adaptive thinking: the billing change you might miss
Extended thinking is gone. If you pass thinking: {type: "enabled", budget_tokens: N} to Opus 4.7, the API returns a 400 error. The replacement is adaptive thinking, enabled with thinking: {type: "adaptive"}.
The billing model is the same: thinking tokens cost $25/MTok (the output rate). But there are two things to watch.
First, you are billed for thinking tokens even when display: "omitted" hides them from your response. Omitting display does not skip the computation - it just stops the tokens from reaching you. The cost is the same either way.
Second, Opus 4.7 adds a new effort level called xhigh - available on this model only. There are now five levels: low, medium, high (the default when adaptive thinking is on), xhigh, and max. Using xhigh or max with complex problems generates substantially more thinking tokens. On a problem that uses 50,000 thinking tokens at default effort, xhigh might reach 80,000-100,000. That is an extra $0.75-1.25 per request at $25/MTok.
| Effort level | Behavior | Availability |
|---|---|---|
low | Minimizes thinking, may skip entirely | All Claude 4 models |
medium | Moderate thinking, skips for simple queries | All Claude 4 models |
high | Always thinks (default when adaptive enabled) | All Claude 4 models |
xhigh | Deep exploration, always thinks | Opus 4.7 only |
max | No depth constraints | All Claude 4 models |
If you were using extended thinking with a fixed budget_tokens cap to control costs, adaptive thinking requires a different approach. The model decides how much to think based on the effort level. For cost-sensitive workloads, start at the default high and only use xhigh for tasks that genuinely need the extra depth.
Caching: the numbers that matter
Cache hits cost $0.50/MTok - 10% of the standard input rate. If your workload has a large static system prompt, caching brings the effective input cost far below the headline $5/MTok.
Example: a 200,000-token system prompt sent 500 times a day. Without caching: 100M tokens/day at $5/MTok = $500. With caching: first request writes at $6.25/MTok ($1.25), then 499 hits at $0.50/MTok ($49.90 for the cached tokens) = about $51/day total. That is a 90% reduction on the cached portion.
There is one interaction to watch: switching between adaptive thinking modes (enabled vs. disabled) breaks cache breakpoints for conversation messages. System prompts and tool definitions remain cached regardless. Keep your thinking mode consistent within a session if caching economics matter.
Batch and cache discounts stack. Batch + cache hit = $2.50 x 10% = $0.25/MTok effective input cost. At that level, the tokenizer overhead is nearly irrelevant.
Benchmark scores
Opus 4.7 leads on coding and agentic tasks. It trails on browser-based research.
| Benchmark | Opus 4.7 | Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 87.6% | 80.8% | N/A | 80.6% |
| SWE-bench Pro | 64.3% | 53.4% | 57.7% | 54.2% |
| GPQA Diamond | 94.2% | 91.3% | 94.4%* | 94.3% |
| Terminal-Bench 2.0 | 69.4% | 65.4% | 75.1% | 68.5% |
| MCP-Atlas (tool use) | 77.3% | 75.8% | 68.1% | 73.9% |
| OSWorld-Verified | 78.0% | 72.7% | 75.0% | N/A |
| BrowseComp | 79.3% | 83.7% | 89.3%* | 85.9% |
| CharXiv (visual, tools) | 91.0% | 77.4% | N/A | N/A |
* GPT-5.4 Pro tier. Highlighted scores are category leaders. Sources: LLM Stats, BenchLM
The coding numbers are the argument for paying the Opus premium. SWE-bench Verified at 87.6% is 7 points ahead of Opus 4.6 and 7 points ahead of Gemini 3.1 Pro. SWE-bench Pro - which uses harder, real production repositories - shows the same gap: 64.3% vs GPT-5.4's 57.7% and Gemini 3.1 Pro's 54.2%. For code agents, that gap is meaningful.
BrowseComp is worth looking at. Opus 4.7 scores 79.3% on web research while GPT-5.4 Pro hits 89.3% and Gemini 3.1 Pro reaches 85.9%. For browser-based research workloads, GPT-5.4 at $2.50/MTok beats Opus 4.7 on that benchmark at half the input cost.
Cost comparison
| Model | Input / 1M | Output / 1M | Cache hit | Best for |
|---|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $25.00 | $0.50 | Coding agents, complex tool use |
| GPT-5.4 | $2.50 | $15.00 | $1.25 | Web research, broad reasoning |
| Gemini 3.1 Pro | $2.00 | $12.00 | $0.20 | Cost-efficient frontier tasks |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | Anthropic quality at lower cost |
At standard pricing, Opus 4.7 costs 2x more on input than GPT-5.4 and 2.5x more than Gemini 3.1 Pro. Add the tokenizer overhead on code-heavy workloads and the effective gap reaches 2.7x vs GPT-5.4. The premium is real.
For workloads that do not specifically need top-tier coding benchmarks, Sonnet 4.6 at $3/$15 is worth testing first. Same 1M context, same Anthropic model family, 40% less per input token.
Other changes from Opus 4.6
Sampling parameters restricted
You can no longer set both temperature and top_p. Pick one - setting both returns a 400 error. Temperature and top_k also cannot both be set to non-default values.
Thinking content hidden by default
On Opus 4.6, thinking output defaulted to 'summarized'. On 4.7 it defaults to 'omitted'. Add display: 'summarized' explicitly if you depend on receiving thinking traces.
Higher-resolution vision
Max image resolution raised from 1.15MP to 3.75MP. Coordinates in computer use tasks are now 1:1 with actual pixels - no scaling needed.
Knowledge cutoff moved up
Reliable knowledge through January 2026, vs May 2025 on Opus 4.6. Eight months of additional training data, which matters for any workload involving recent models, tools, or events.
Task budgets (beta)
New advisory token limit for agentic loops via the task-budgets-2026-03-13 beta header. The model is informed of the budget, not hard-stopped. Batch API supports 300k output via output-300k-2026-03-24 beta header.
Quick reference
Model ID
claude-opus-4-7
Released
April 16, 2026
Input price
$5.00 / 1M
Output price
$25.00 / 1M
Context window
1M tokens
Max output
128K (300K batch)
Batch input
$2.50 / 1M
Cache hit
$0.50 / 1M
Tokenizer impact
Up to 1.35x on code
FAQ
How much does Claude Opus 4.7 cost per million tokens?
Standard pricing is $5.00 input / $25.00 output per million tokens. Batch API is $2.50/$12.50. Cache hits cost $0.50/MTok. Cache writes are $6.25/MTok (5-min TTL) or $10.00/MTok (1-hr TTL). All rates are identical to Opus 4.6.
Does the new tokenizer make Opus 4.7 more expensive than Opus 4.6?
For code and structured data (JSON, XML), yes - the tokenizer can use up to 35% more tokens for the same content. For standard English prose, the impact is minimal. Anthropic documents that 1M Opus 4.7 tokens covers roughly 555k words, vs 750k words on Opus 4.6.
What replaced extended thinking in Claude Opus 4.7?
Adaptive thinking, enabled via thinking: {type: 'adaptive'}. The old budget_tokens parameter returns a 400 error. Thinking tokens are billed at the $25/MTok output rate and are counted whether or not you display them.
What is Claude Opus 4.7's context window?
1 million tokens input at standard per-token pricing - no long-context surcharge. Maximum output is 128k tokens synchronously. Batch API supports up to 300k output tokens via the output-300k-2026-03-24 beta header.
How does Claude Opus 4.7 compare to GPT-5.4 on benchmarks?
Opus 4.7 leads on coding: SWE-bench Verified 87.6%, SWE-bench Pro 64.3% vs GPT-5.4's 57.7%. GPQA Diamond near-tied at 94.2% vs 94.4% (GPT-5.4 Pro). GPT-5.4 leads on BrowseComp (89.3% Pro vs 79.3%) and Terminal-Bench 2.0 (75.1% vs 69.4%). Opus 4.7 costs 2x more on input.
What is Claude Opus 4.7's API model ID?
claude-opus-4-7 on the Anthropic API. On Google Vertex AI: claude-opus-4-7 or claude-opus-4-7@20260416. On Microsoft Foundry: claude-opus-4-7@20260416.