DeepSeek scrapped the May 31 price cliff. The 75% V4-Pro cut is the permanent rate now.
For three weeks the V4-Pro pricing page carried a countdown: the discounted $0.435 input and $0.87 output rates expired May 31, after which both quadrupled. On May 22 DeepSeek deleted the expiry and the promotional label in one edit. The cheap price is no longer a promo. It is the list price, and there is nothing on the calendar pushing it back up.

Photo by Conny Schneider on Unsplash
The discount is the price now. DeepSeek made the V4-Pro cut permanent on May 22. Cache-miss input holds at $0.435 per million, output at $0.87, cache-hit input at $0.003625. The June 1 jump to $1.74 / $3.48 is gone. Nothing else moved: V4-Flash is still $0.14 / $0.28, the weights are still on Hugging Face under MIT, and the model is still a 1.6T-parameter mixture-of-experts with 49B active and a 1M-token window. The only thing that changed is that the cheapest frontier API of the year stopped being a clearance sale.
What actually changed on May 22
The edit is small and easy to miss. The V4-Pro row on the API pricing page used to read "promotional pricing, valid until 2026-05-31" with the full $1.74 / $3.48 rate listed underneath in grey. As of May 22 that second line is gone, the date is gone, and the word "promotional" is gone. What remains is a single rate: $0.435 input, $0.87 output, $0.003625 cached input, per million tokens. The changelog entry calls it "updated V4-Pro list pricing" and nothing more.
That phrasing matters. A promo that gets extended is still a promo, and a sensible finance team treats it as borrowed time. A list price is the number the vendor commits to until it announces otherwise. DeepSeek has now moved V4-Pro from the first category to the second. The model has been billing at $0.435 since April 24, so for most users the invoice does not change at all. What changes is that the invoice you got last week is now the one you can plan around.
The forecast we told you to run is now wrong
We will own this one. On May 8 we published a piece arguing that the price "every V4-Pro article has been citing is the launch promo, not the steady-state rate," and we told readers to pencil $1.74 / $3.48 into their forecast and to push heavy one-shot jobs forward into May before the cutover. Read it here. That advice was right given what the pricing page said on May 8. It is wrong now.
If you reorganised a batch backfill or an evaluation run to beat the May 31 deadline, you did not waste the effort, but you also did not need to rush. The honest correction: the cheap rate is the rate. You can move production onto V4-Pro on whatever timeline suits you, and you can quote $0.435 / $0.87 in a procurement document without the asterisk we insisted on three weeks ago. The asterisk is gone.
The permanent price against the frontier
Same comparison shape we always use: one task that consumes 1M input tokens and generates 1M output, no cache, no batch discount. The difference from our last table is that the V4-Pro number no longer needs a "but it expires" footnote.
| Model | Input | Output | 1M+1M total | vs V4-Pro |
|---|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | $0.42 | 0.3x |
| DeepSeek V4-Pro (now permanent) | $0.435 | $0.87 | $1.305 | 1.0x |
| Gemini 3.1 Pro (200K or less) | $2.00 | $12.00 | $14.00 | 10.7x |
| Claude Opus 4.7 | $5.00 | $25.00 | $30.00 | 23.0x |
| GPT-5.5 | $5.00 | $30.00 | $35.00 | 26.8x |
The 23x and 27x headlines are real but they describe a 1:1 token mix that almost no production workload has. We will get to why the multiple you actually see is smaller in a moment. The cleaner takeaway from this table is the one that did not change: V4-Pro is the only model in the frontier-quality bracket priced below $2 for the round trip, and that is now its permanent position rather than a window that closes.
Why DeepSeek made it stick
DeepSeek did not explain the decision, so this is reading the board rather than quoting a statement. Two pressures point the same direction. The first is that V4-Pro is genuinely cheap to serve. The model card puts per-token inference at roughly 27% of V3.2's FLOPs at a 1M context and the KV cache at about 10% of V3.2's footprint. When your serving cost drops that far, a 75% discount stops being a loss leader and starts being a margin you can live with indefinitely.
The second is the company it keeps. The sub-$2 frontier tier got crowded fast this spring. Kimi K2.6, Qwen3.6-Plus, MiniMax M2.7 and GLM-5.1 all landed within a few weeks of each other, all aimed at the same buyers, all priced to undercut the American labs. A 4x step-up on June 1 would have handed those rivals a clean talking point: the cheap DeepSeek number was always temporary, theirs is not. Cancelling the cliff takes that argument off the table and keeps V4-Pro at the bottom of the bracket where it has been since launch.
There is a softer reason too. The expiry was doing real damage to adoption. Every team we spoke to that looked at V4-Pro in April hit the same wall: you cannot move production onto a model whose price is scheduled to quadruple in five weeks. The promo was suppressing the exact behaviour DeepSeek wanted. Making it permanent removes the one objection that had nothing to do with the model's quality.
The raise that did not land
The clearest way to value this is to price the bill that is no longer coming. Take a team running a typical input-heavy agent: 2 billion input tokens and 200 million output tokens a month, a 10:1 ratio that matches most retrieval-and-tools workloads. At the permanent rate that is $870 of input plus $174 of output, so $1,044 a month. Had the cliff held, June 1 would have made it $3,480 plus $696, or $4,176. The cancelled increase is worth $3,132 a month to that one team, every month, with no action required on their part.
Now run the same workload against Claude Opus 4.7: 2,000 million input at $5 is $10,000, 200 million output at $25 is $5,000, so $15,000 a month. That is the number worth sitting with. The headline table says V4-Pro is 23x cheaper than Opus, but on this realistic 10:1 mix the gap is 14.4x, because Opus charges most of its premium on output and this workload barely uses any. The lesson is older than this price change: your real multiple depends on your token ratio, not the marketing number. Run your own numbers in the calculator before you quote anyone a savings figure.
What still is not guaranteed
Permanent is a strong word, but it is DeepSeek's word, not a contract. The list price can move again. It moved down to get here, and DeepSeek has a track record of cutting further when capacity catches up to demand, so the realistic risk is not a surprise hike back to $1.74. It is the opposite: a competitor undercuts, DeepSeek answers, and the number you forecast today turns out conservative. That is a pleasant kind of forecasting error to carry.
Two things make the commitment more credible than a typical promo extension. The weights are open under MIT, so a self-hosted fallback exists if the hosted price ever turns unfriendly, which caps how far DeepSeek can squeeze API customers. And cache-hit input at $0.003625 sits as its own line item, untouched by any of this, so a pipeline with a stable system prompt and a high prefix-cache ratio is already paying close to nothing on the input side regardless of where the headline rate lands. Turn caching on if you have not. It is the cheapest insurance in this whole story.
Sources
- DeepSeek API pricing - permanent V4-Pro list rate, removed expiry, cache-hit math
- DeepSeek API changelog - May 22 "updated V4-Pro list pricing" entry
- DeepSeek-V4-Pro on Hugging Face - model card, MIT license, parameter counts, serving-cost figures
- Our May 8 piece on the promo deadline - the forecast this update corrects
- Anthropic pricing page - Claude Opus 4.7 list rates for the comparison
- Google Gemini API pricing - Gemini 3.1 Pro tier pricing