DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7: what three frontier models actually cost
Three major releases in two weeks. DeepSeek V4 on April 24, GPT-5.5 on April 23, Claude Opus 4.7 on April 16. GPT-5.5 currently leads global benchmarks. DeepSeek V4-Pro is 91% cheaper at launch. Here's what the numbers actually mean for production workloads.

At a glance
| Model | Input / 1M | Output / 1M | Cache read | Context | AA Index |
|---|---|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | $0.003 | 1M / 384K out | - |
| DeepSeek V4-Pro (discount) | $0.435 | $0.87 | $0.004 | 1M / 384K out | - |
| DeepSeek V4-Pro (full price) | $1.74 | $3.48 | $0.015 | 1M / 384K out | - |
| Claude Opus 4.7 | $5.00 | $25.00 | $0.50 | 1M / 128K out | 57 (#3) |
| GPT-5.5 | $5.00 | $30.00 | $0.50 | 1M / 128K out | 60 (#1) |
| GPT-5.5 Pro | $30.00 | $180.00 | - | 1M / 128K out | - |
Prices from official provider docs (April 28, 2026). DeepSeek V4-Pro discount expires May 31, 2026. AA Index = Artificial Analysis Intelligence Index. DeepSeek V4-Pro not yet listed on Artificial Analysis.
The price gap is real and it is large
DeepSeek V4-Pro at launch discount costs $0.435/M input and $0.87/M output. GPT-5.5 costs $5/M and $30/M - call it $1 vs 9 cents on input, $1 vs 3 cents on output. Claude Opus 4.7 is $5/M and $25/M, same input price as GPT-5.5 but $5 cheaper per million on output.
Timing matters here. After May 31, DeepSeek V4-Pro goes to $1.74/$3.48 per million tokens - still less than half what GPT-5.5 and Claude charge for input, but the gap on output shrinks from 34x to about 9x. Evaluate V4-Pro for production before June if you want to make the decision at discount prices.
DeepSeek V4-Flash is the more interesting budget story. At $0.14/$0.28 per million tokens with 1M context and 384K max output, it undercuts GPT-5.5 by 97% on input. It is not the same model - Flash uses 13 billion active parameters vs Pro's 49 billion - but it handles reasoning in non-think, think-high, and think-max modes.
| Scenario | DeepSeek V4-Pro (discount) | Claude Opus 4.7 | GPT-5.5 |
|---|---|---|---|
| Batch summarization (1M in / 200K out per day) | $0.61 | $10.00 | $11.00 |
| Real-time chat backend (10M in / 2M out per day) | $6.09 | $100.00 | $110.00 |
| Content moderation pipeline (50M in / 10M out per day) | $30.45 | $500.00 | $550.00 |
| Agent loop at scale (100M in / 20M out per day) | $60.90 | $1,000.00 | $1,100.00 |
Daily cost estimates using discounted V4-Pro rates. Assumes 20% output ratio to input. Opus 4.7 uses standard (non-batch) pricing here.
Benchmark scores: what we actually know
The Artificial Analysis Intelligence Index has GPT-5.5 at 60 (ranked #1 globally), Claude Opus 4.7 at 57 (ranked #3). DeepSeek V4-Pro is not yet listed there. From the official DeepSeek model card, here is how V4-Pro Max compares against the models V4 was benchmarked against at release:
| Benchmark | DeepSeek V4-Pro Max | Claude Opus 4.6 Max * | GPT-5.4 xHigh * |
|---|---|---|---|
| MMLU-Pro | 87.5% | 89.1% | 87.5% |
| GPQA Diamond | 90.1% | 91.3% | 93.0% |
| HLE | 37.7% | 40.0% | 39.8% |
| LiveCodeBench | 93.5% | 88.8% | - |
| SWE-bench Verified | 80.6% | 80.8% | - |
| BrowseComp | 83.4% | 83.7% | 82.7% |
| Codeforces rating | 3206 | - | - |
* DeepSeek benchmarked against Opus 4.6 and GPT-5.4, not Opus 4.7 or GPT-5.5. Source: DeepSeek V4-Pro model card on HuggingFace. GPT-5.5 individual benchmark scores not extracted from OpenAI (site blocked crawling).
Across these benchmarks, V4-Pro lands within a percentage point or two of Claude Opus 4.6 Max on most tasks and edges it on LiveCodeBench by 4.7 points. Against GPT-5.4, it trails by 2.9 points on GPQA Diamond and 2.1 points on HLE.
Comparing directly to GPT-5.5 is harder right now. GPT-5.5 ranks #1 on Artificial Analysis overall, above where V4-Pro sat against 5.4. For coding and agentic work specifically, SWE-bench Verified is essentially tied (80.6% vs 80.8% for Opus 4.6) - at a fraction of the price.
Worth being honest about what is missing from this table: direct V4-Pro vs GPT-5.5 head-to-head on the same benchmarks does not exist yet. The comparison above is V4-Pro vs older models; GPT-5.5 ranks higher than those older models; so V4-Pro probably trails GPT-5.5 on at least some benchmarks. By how much, we do not know yet.
What makes DeepSeek V4 different under the hood
DeepSeek V4-Pro has 1.6 trillion total parameters but only 49 billion active per token. V4-Flash is 284 billion total, 13 billion active. Both trained on 32 trillion tokens. The cost-efficiency argument is the same one that made V3 competitive: the active parameter count, not total parameters, drives inference cost.
V4 also introduced two architectural changes worth knowing about. Manifold-Constrained Hyper-Connections (mHC) replaces traditional residual connections in the MoE layers, and a Muon optimizer replaces Adam for training stability. For developers, these details mostly matter because they explain why V4-Pro processes 1M-token context while using only 27% of the FLOPs of a single-token inference call compared to V3.2 at similar sequence lengths.
The three reasoning modes (non-think, think-high, think-max) let you tune cost vs quality per request. Non-think mode is cheapest; think-max is where the benchmark numbers above come from. That is a useful dial for production workloads where not every call needs maximum reasoning.
Claude Opus 4.7: what changed and what the pricing buys you
Opus 4.7 launched April 16 at the same $5/M input price as its predecessor. The meaningful changes are in capability, not price. High-resolution image support went from 1.15 megapixels to 3.75 megapixels. Coordinates are now 1:1 with actual pixels, which matters for computer use tasks where Opus 4.6 required scaling adjustments. A new "xhigh" effort level was added for coding and agentic work.
One thing Anthropic removed: fixed thinking budgets. Opus 4.7 uses adaptive thinking only - you can turn thinking on or off but cannot set a token budget for it. They also removed temperature, top_p, and top_k sampling parameters; passing any non-default value returns a 400 error. If your current Opus 4.6 integration sets any of those, it will break on 4.7.
The cache pricing is genuinely useful here. Standard reads cost $0.50/M (one-tenth of the input price), and the batch API cuts costs in half. For workflows that reuse a large system prompt across many calls, cached input at $0.50/M brings Opus 4.7 much closer to V4-Pro on per-call cost.
| Claude Opus 4.7 pricing tier | Input / 1M | Output / 1M |
|---|---|---|
| Standard | $5.00 | $25.00 |
| Cache read | $0.50 | - |
| 5-min cache write | $6.25 | - |
| 1-hour cache write | $10.00 | - |
| Batch API (50% off) | $2.50 | $12.50 |
GPT-5.5: the benchmark leader at a price premium
GPT-5.5 launched April 23 at $5/M input and $30/M output. That $30 output price is what makes it the most expensive of the three standard-tier models here - $5 more per million output tokens than Claude Opus 4.7 and 34x more than V4-Pro at discount.
OpenAI claims GPT-5.5 uses meaningfully fewer tokens per task than GPT-5.4 for the same agentic work. According to their own usage data from Codex runs, GPT-5.5 generated about 37% fewer tokens than GPT-5.4 for equivalent outputs. If that holds for your workloads, the effective cost per task is closer to GPT-5.4 pricing than the raw per-token numbers suggest.
GPT-5.5 Pro at $30/$180 is available in the Responses API only. That pricing puts a standard 10K input / 2K output call at $0.41, compared to $0.10 for Opus 4.7 and $0.006 for V4-Pro at discount. GPT-5.5 Pro is not a general-purpose API option for most developers; it is priced for high-stakes tasks where benchmark quality matters more than cost.
Which one to use
The answer depends on your risk tolerance, workload type, and token volume.
DeepSeek V4-Pro makes sense if you are running coding tasks, structured extraction, or document processing at scale, and your team can do a proper eval before the discount expires. The SWE-bench numbers suggest it is competitive with Opus 4.6 on coding. If the task does not require GPT-5.5's specific advantages (which are still not well-documented in publicly available benchmarks), the 11x cost difference on input is hard to justify ignoring.
Claude Opus 4.7 makes sense if you need high-resolution vision inputs, the batch API discount is applicable to your workflow, or you are already embedded in the Anthropic ecosystem (AWS Bedrock, Google Vertex, Microsoft Foundry). It scores well on reasoning and is the safer choice for science-adjacent tasks based on HLE scores. It is also the only one of the three with a confirmed 50% batch discount.
GPT-5.5 makes sense if you are building agentic systems where the #1 Artificial Analysis ranking translates to meaningfully better task completion, or if your evaluation data shows GPT-5.4 outperforming alternatives on your specific tasks. The token efficiency claim is real and worth testing, but requires you to run your own workloads through it rather than taking the marketing number at face value.
One thing I keep coming back to: DeepSeek V4 is not available through AWS Bedrock, Azure, or Google Cloud. For enterprise teams with procurement or data-handling requirements around US-based infrastructure, that is a real constraint that does not show up in the pricing table.
Sources
- - DeepSeek API pricing (official): api-docs.deepseek.com
- - DeepSeek V4-Pro model card and benchmarks: huggingface.co/deepseek-ai/DeepSeek-V4-Pro
- - DeepSeek V4 technical report (PDF): DeepSeek_V4.pdf
- - Claude Opus 4.7 pricing (official): platform.claude.com
- - Claude Opus 4.7 what's new: platform.claude.com/docs/.../whats-new-claude-4-7
- - GPT-5.5 announcement: OpenAI community thread
- - GPT-5.5 and GPT-5.5 Pro on OpenRouter: openrouter.ai
- - Artificial Analysis Intelligence Index: artificialanalysis.ai