Skip to main content
TokenCost logoTokenCost
Model ReleaseApril 13, 2026·8 min read

DeepSeek V4: $0.30 per million tokens for a 1 trillion parameter model

DeepSeek V4 is not out yet. A grayscale test interface appeared April 8, Reuters confirmed a late-April window on April 3, and every major pricing tracker has landed on $0.30 per million input tokens. Here's what the leaked specs mean for anyone running production workloads.

Blue and purple abstract 3D spiral on dark background representing AI model architecture

Photo by Milad Fakurian on Unsplash

Reported specs at a glance

StatusNot launched (late April 2026 window per Reuters, April 3)
ArchitectureMoE — ~1T total parameters, ~37B active per token
Context1M tokens via 'Engram' memory (V3.2: 128K)
ChipsHuawei Ascend 910B training, Ascend 950PR inference. Zero Nvidia.
Price (projected)$0.30/M input, $0.50/M output — tracker consensus, not official
BenchmarksSWE-bench 80-85%, HumanEval 90%, GPQA 59.1% — all internal/unverified
LicenseApache 2.0 planned (unconfirmed until release)

Why the $0.30 figure is worth paying attention to

DeepSeek V3.2 already sits at $0.28/M input. So V4 is not a dramatic price cut - it's roughly the same cost as the current model, but with 1 million tokens of context instead of 128K and a significantly larger expert pool. If the leaked benchmark numbers hold up anywhere near frontier-level performance, you're getting that capability at prices most developers associate with mid-tier models.

The comparison that stops me: Claude Opus 4.6 charges $5.00/M input. GPT-5.4 is $2.50/M. DeepSeek V4 is projected at $0.30/M. If V4's real-world results on your actual workloads come close to either of those models, the math at scale is hard to ignore.

There are real caveats - all benchmarks are internal leaks, the Huawei chip dependency introduces enterprise procurement friction, and the model is not live yet. But the signals are consistent enough to be worth understanding before it drops.

Timeline of V4 signals

DateSignalSource
Feb 2026Original Lunar New Year launch window passes. Hardware engineering challenges cited.Multiple reports
March 9V4 Lite (~200B params) briefly appears on DeepSeek website, then disappears.Dataconomy
March 16April launch target reported after sourced reporting.Dataconomy
April 3Reuters: launch 'within weeks.' Confirms Huawei Ascend chip use.Reuters / The Information
April 7TrendForce confirms Huawei Ascend 950PR as inference chip.TrendForce
April 8TechNode publishes grayscale screenshot: 'Fast,' 'Expert,' and 'Vision' mode tabs in test interface.TechNode
April 13Official DeepSeek API docs still show V3.2 only. No V4 listing.api-docs.deepseek.com

The test interface screenshot is the most concrete signal. Three distinct modes in a single interface suggests the product lineup is finalized. DeepSeek does not do long soft launches - V3 and R1 both went public with short notice and minimal pre-announcement.

Why $0.30 is plausible for a 1T parameter model

The number that matters for inference cost is not total parameters - it's active parameters per token. V4's MoE architecture routes each token through roughly 37 billion parameters, the same as V3.2. Total model size went from 671B to ~1T, but per-token compute stayed flat because V4 just has more experts to choose from - not more experts active simultaneously.

The 1M context window is the harder engineering problem. DeepSeek built two new components to make it cost-viable: Engram handles conditional long-context memory, and DeepSeek Sparse Attention (DSA) cuts the compute cost of attending over long sequences. Without something like DSA, serving 1M context at $0.30/M tokens would lose money on every call - attention computation scales quadratically with sequence length.

Whether these architectural claims hold under independent load is unknown. DeepSeek's V3 and R1 claims did hold up when we and others tested them - R1 matched o1 on several benchmarks at roughly 3% of the cost. V4 is making larger claims from a more complex architecture, so I would not assume the same track record carries over automatically.

Where V4 pricing sits against the current market

Using the $0.30/$0.50 projected figures against current production prices:

ModelInput / 1MOutput / 1MContextvs V4 input
DeepSeek V4 (projected)$0.30$0.501Mbaseline
DeepSeek V3.2 (current)$0.28$0.42128K1.1x cheaper
DeepSeek R1$0.55$2.1964K1.8x more
Gemini 3.1 Pro$2.00$12.001M6.7x more
GPT-5.4$2.50$15.00272K / 1M8.3x more
Claude Opus 4.6$5.00$25.00200K17x more

V4 pricing from tracker consensus (devtk.ai, NxCode). Not official DeepSeek pricing. All other prices current as of April 2026.

On output tokens, V4 pulls further ahead. At $0.50/M vs Claude Opus 4.6's $25/M, that is a 50x difference on the token type that actually drives costs in most applications. A workload generating 100M output tokens per month costs $50 on V4 vs $2,500 on Opus. At that scale, even a meaningful quality gap might not justify staying put.

Worth saying plainly: DeepSeek V3.2 at $0.28 input is already cheap and performs well on coding tasks. V4 is marginally more expensive on input while adding 1M context - a good trade if you actually need long context, irrelevant if your workloads run sub-32K. You can check V3.2's current live pricing on OpenRouter if you want to compare against third-party providers.

The Huawei chip dependency

DeepSeek V3 ran on Nvidia H800s - the export-restricted version of the H100 that could still reach China. V4 reportedly moves entirely to Huawei Ascend 910B for training and Ascend 950PR for inference. If that holds, it is the clearest evidence yet that US export controls did not achieve their stated goal of slowing Chinese frontier AI development. DeepSeek did not hit a hardware ceiling - it built around it.

For US developers, the practical question is API accessibility. DeepSeek's API has been internationally accessible through V3 and V3.2 without restrictions from the DeepSeek side. Congressional reports have flagged data security concerns, some governments have restricted DeepSeek apps at the device level, but those restrictions have generally not extended to API access in most jurisdictions.

Enterprise procurement teams at larger US companies will need legal review before routing production traffic through DeepSeek regardless of the per-token price. That friction is real and does not appear in the pricing table. For startups and individual developers, it is much less of a blocker.

All benchmark numbers are from leaks

SWE-bench 80-85%, HumanEval 90%, MMLU 88.5 - these come from internal DeepSeek testing and secondhand reporting. No independent evaluation has happened. The pattern with V3 and R1 was that leaked claims held up well on independent replication, but V4 is making larger claims from a more complex architecture.

One number worth watching specifically: GPQA Diamond. The leaked V4 claim is 59.1%, which would be below Claude Opus 4.6 (80.8%) and well below Gemini 3.1 Pro (94.3%). If that holds, V4 is a strong coding and instruction-following model, not a scientific reasoning model. That changes which workloads it makes sense for.

Most production LLM workloads are not running GPQA Diamond problems. Document processing, code generation, summarization, structured extraction - for these, V4's coding and instruction-following numbers are the relevant signal.

Before V4 launches

If you are running DeepSeek V3.2 now, the upgrade calculus is straightforward: similar price, 8x more context. The main risk is whether the new architecture introduces regressions on tasks V3.2 handles reliably. Run your eval suite against V4 on day one before migrating any production traffic.

If you are on GPT-5.4 or Claude Opus 4.6 for production workloads and have not seriously tested DeepSeek, V4's launch is a reasonable moment to benchmark it. The price differential is large enough that even a 10-15% quality gap on your specific tasks might be worth accepting for high-volume workloads.

We'll add V4 to the pricing comparison page the day it launches. The cost calculator lets you model what $0.30 input would cost at your current token volumes - worth running before the model drops so you have a number ready when it is time to evaluate.

Sources

  • - Reuters (via The Information): "DeepSeek V4 to launch within weeks, running on Huawei chips," April 3, 2026
  • - TechNode: "DeepSeek V4 may launch this month, test interface suggests Vision and Expert modes," April 8, 2026
  • - TrendForce: "Decoding DeepSeek V4: how Huawei's Ascend 950PR is powering China's push to break CUDA dependence," April 7, 2026
  • - Dataconomy: "DeepSeek V4 and Tencent's new Hunyuan model to launch in April," March 16, 2026
  • - HuaweiCentral: "DeepSeek V4 model will run entirely on Huawei AI chips," 2026
  • - NxCode: DeepSeek V4 release specs and benchmark report, April 2026
  • - DevTk.ai: DeepSeek V4 model pricing tracker
  • - DeepSeek API docs (api-docs.deepseek.com): current V3.2 pricing ($0.28/$0.42 per million tokens)