DeepSeek V4-Pro's 75% promo ends May 31. After that, the price is 4x what most people are quoting.
The price every V4-Pro article has been citing since April 24 is the launch promo, not the steady-state rate. DeepSeek extended the discount once already; the current cutoff on the official pricing page is May 31, 2026 at 15:59 UTC. After that, $0.435 input becomes $1.74 and $0.87 output becomes $3.48. Twenty-three days of promo are left, and the post-cutover price is the one that should be in your forecast spreadsheet, not the one in last week's VentureBeat headline.

Photo by Solen Feyissa on Unsplash
Cutover: 2026-05-31, 15:59 UTC. Cache-miss input goes from $0.435 to $1.74 per million tokens, output goes from $0.87 to $3.48. Cache-hit input stays at $0.0145, listed as a separate line item rather than part of the promo. V4-Flash is untouched at $0.14 / $0.28. Post-cutover, V4-Pro still bills at roughly a sixth of Opus 4.7's rate and a seventh of GPT-5.5's. Open weights, MIT license, 1M context, mixture-of-experts at 1.6T total parameters with 49B active.
The promo, in plain numbers
DeepSeek published V4-Pro on April 24, 2026 at a flat 75% off the full list rate and wrote the discount into the API pricing docs. The original promo carried an earlier end date which has since been pushed to May 31. The page still labels the entry as promotional, so the fair assumption is that the next move is up rather than down.
| Per 1M tokens | Promo (now until May 31) | Full price (June 1 onward) | Step |
|---|---|---|---|
| Input (cache miss) | $0.435 | $1.74 | 4.0x |
| Input (cache hit) | $0.003625 | $0.0145 | 4.0x |
| Output | $0.87 | $3.48 | 4.0x |
The 4x is uniform. Whatever your input/output ratio looks like, your bill on June 1 is four times whatever it was on May 30. The cache-hit absolute number stays negligibly small in absolute terms, but if your prefix-cache ratio sits high, do the multiplication anyway: a 60% cache-hit pipeline still inherits the 4x on the remaining 40% of input plus all of the output.
What V4-Pro is, for the people who skipped the launch week
V4-Pro is the larger half of the V4 family that DeepSeek shipped on April 24. It is a 1.6 trillion parameter mixture-of-experts model with 49B parameters active per forward pass and a 1M-token context window. The model card recommends a 384K context budget when running in Think Max mode. The instruct checkpoint runs in mixed FP4 plus FP8 precision. Weights are on Hugging Face under an MIT license, which is the unusual part for a model that holds 80% on SWE-bench Verified.
The architecture work behind it - hybrid attention combining CSA and HCA, the Manifold-Constrained Hyper-Connections wiring, the Muon optimizer - is interesting technical detail, but it is not what changes for buyers. The two numbers that matter for the bill are the per-token inference FLOPs (around 27% of V3.2's at 1M context) and the KV cache footprint (10% of V3.2). Cheap to serve translates, with a discount on top, into the lowest steady-state frontier price published anywhere this year.
V4-Pro also ships three reasoning effort modes: non-think, think (default), and think-max, which DeepSeek calls V4-Pro-Max in the benchmark tables. All three modes bill at the same per-token rate. The differences are in latency and total tokens consumed per response, not in the price per token, which means think-max disproportionately punishes your bill on long generation if you are not careful with output caps.
What 23 days of promo pricing actually buys you
Most teams will not move large production traffic onto a model that is mid-promo. Where the window matters is for one-shot heavy passes: backfilling embeddings or summaries on a large historical corpus, running an evaluation suite at scale, generating synthetic training data, or auditing a regression on yesterday's production logs. Push those forward into May, and the price difference is real.
A concrete shape: an evaluation harness that submits 50M input tokens and reads back 5M output tokens. Promo bill: $25.10. Post-promo bill: $100.20. That is the difference between an ad-hoc engineering experiment and something a manager has to sign off on. If you have queued evals to run, run them now.
The opposite is also worth saying. If you have been running V4-Pro continuously in production over the past two weeks, your daily run rate quadruples on June 1 regardless of any choice you make in May. Update the forecast. Tell finance. Pencil in the post-promo number, not the one in the launch coverage.
Where V4-Pro sits against the frontier, before and after
One agent task: 1M input tokens consumed, 1M output tokens generated, no cache, no batch discount. Ugly but level-playing-field. Same shape across the four models most teams cross-shop at the top of the quality ladder right now.
| Model | Input | Output | 1M+1M total | Multiple of V4-Pro promo |
|---|---|---|---|---|
| V4-Pro (promo, until May 31) | $0.435 | $0.87 | $1.305 | 1.0x |
| V4-Pro (post-promo) | $1.74 | $3.48 | $5.22 | 4.0x |
| Gemini 3.1 Pro (200K or less) | $2.00 | $12.00 | $14.00 | 10.7x |
| Claude Opus 4.7 | $5.00 | $25.00 | $30.00 | 23.0x |
| GPT-5.5 | $5.00 | $30.00 | $35.00 | 26.8x |
Two things the table is saying. The gap to Opus 4.7 was the kind of difference that ends procurement debates during the promo and merely wins them after; even at full price, V4-Pro bills at about a sixth of Opus and the open-weights story still stands. The shift on Gemini 3.1 Pro is more interesting: a roughly 11x cost advantage shrinks to 2.7x, which is the band where output quality starts driving the decision more than the per-token rate. Anyone who wrote a procurement brief on promo math gets to redo it on the 1/6-the-cost framing most of the launch coverage borrowed.
The benchmark comparison DeepSeek's own card draws
Worth flagging because of how this article spreads. The numbers in DeepSeek's published benchmark table on Hugging Face compare V4-Pro-Max against Opus 4.6 Max, GPT-5.4 xHigh, and Gemini 3.1 Pro High. Not Opus 4.7. Not GPT-5.5. The labels in a lot of secondary coverage have drifted forward to the newer Anthropic and OpenAI models, which are not the ones DeepSeek itself benchmarked.
| Benchmark | V4-Pro-Max | Opus 4.6 Max | GPT-5.4 xHigh | Gemini 3.1 Pro High |
|---|---|---|---|---|
| SWE-bench Verified | 80.6 | n/a | n/a | n/a |
| SWE-bench Pro | 55.4 | n/a | n/a | n/a |
| MMLU-Pro | 87.5 | 89.1 | 87.5 | 91.0 |
| GPQA Diamond | 90.1 | 91.3 | 93.0 | 94.3 |
| SimpleQA-Verified | 57.9 | 46.2 | 45.3 | 75.6 |
| HLE | 37.7 | 40.0 | n/a | n/a |
| Terminal-Bench 2.0 | 67.9 | n/a | n/a | n/a |
The honest reading: V4-Pro-Max sits within a percentage point or two of Opus 4.6 and GPT-5.4 on the headline knowledge benchmarks, beats both on Chinese-language QA and SimpleQA, trails Gemini 3.1 Pro on hard knowledge, and comes in slightly under Opus on Humanity's Last Exam. SWE-bench numbers are V4-Pro alone because DeepSeek did not publish matched-methodology numbers for the closed labs there. Anybody quoting V4-Pro at parity with Opus 4.7 or GPT-5.5 specifically is borrowing the comparison and updating the labels.
What changes May 31, what does not
Three pieces of the V4 pricing structure are independent of the promo and are not moving on June 1. First, V4-Flash. The cheap variant ships at $0.14 input / $0.28 output, no discount applied, and is not advertised as promotional anywhere on the page. Second, the cache-hit ratio. The pricing page lists cache-hit input at $0.0145 per million tokens (and $0.003625 during the promo), a small fraction of the cache-miss rate. It sits as a separate line item rather than a promo discount. Third, the open weights. V4-Pro and V4-Pro-Base remain on Hugging Face under MIT, so anyone running it self-hosted does not see a price change at all on June 1.
Two pieces could change but have not been announced. There is no commitment from DeepSeek that the May 31 cutover holds; the promo has been extended once already, so a second extension is plausible. There is also no commitment that the $1.74/$3.48 number holds long-term. DeepSeek has historically reset prices downward when capacity catches up to demand. Treat the published full price as the base case for forecasting, treat further extensions as upside, and treat self-hosting as the option you keep open if the spread to Opus 4.7 narrows further.
If your production stack already runs V4-Pro
Run the post-promo number through your forecast spreadsheet today, not in June. If you have spend approval based on the promo math, get it re-approved at the full price now, before someone notices the bill quadrupled. Almost nobody who gets a green light at $0.435 has explicit sign-off at $1.74; the language usually says "V4-Pro" without a specific number.
If you have not yet enabled prefix caching, do that this week. Cache-hit input at $0.0145 stays effectively free relative to anything else in the bracket, and cache-hit ratios above 50% on stable system prompts are the difference between a painful June 1 and an unremarkable one. Cache-hit pricing sits as a separate line item from the promo, so it stays cheap regardless of what happens to the headline rate.
For workloads where V4-Pro-Max is overkill, switch the routing layer to V4-Flash. $0.14 input / $0.28 output for an 80% SWE-bench-class capability ceiling on the small model is a genuinely strange amount of capability per dollar, and it does not change on June 1. Save Pro for the requests that actually need the larger model.
Sources
- DeepSeek API pricing - official promo end date, full and discounted rates, cache-hit math
- DeepSeek API changelog - April 24 V4 launch and pricing change history
- DeepSeek-V4-Pro on Hugging Face - model card, license, parameter counts, benchmark table
- DeepSeek-V4-Flash on Hugging Face - V4-Flash specifications and benchmarks
- Anthropic pricing page - Opus 4.7 list rates and cache pricing for the comparison
- Google Gemini API pricing - Gemini 3.1 Pro Preview tier pricing
- VentureBeat - DeepSeek V4 arrives at 1/6 the cost - launch coverage, used as cross-check for capability framing only