Mistral Medium 3.5 charges 17x more than DeepSeek V4 Flash and loses the only benchmark they both report
Mistral shipped Medium 3.5 on April 29 at $1.50 / $7.50 per million tokens with two published benchmark numbers. DeepSeek V4 Flash sits at $0.14 / $0.28 with ten published benchmark numbers and beats Medium 3.5 on the only shared one. Run the math before you pick the premium tier.

Photo by Julia Taubitz on Unsplash
At a glance
| Field | Mistral Medium 3.5 | DeepSeek V4 Flash |
|---|---|---|
| Input / 1M | $1.50 | $0.14 |
| Output / 1M | $7.50 | $0.28 |
| Cache-hit input / 1M | Not published | $0.0028 |
| Architecture | 128B dense | 284B / 13B MoE |
| Context window | 256K | 1M |
| Max output | 32K | 384K |
| License | Modified MIT | MIT |
| Multimodal input | Text + image | Text only |
| Released | April 29, 2026 | April 24, 2026 |
Mistral pricing per Artificial Analysis provider listing. DeepSeek pricing from official api-docs. Both verified May 6, 2026.
What four real workloads actually cost
Some launch coverage put the price gap at 26x. That number assumes you blend in DeepSeek's cache-hit input pricing, which only applies to repeat queries with stable prefixes. Without caching, you're paying about fifteen times more on Mistral, give or take a couple of points depending on how output-heavy your workload runs.
Four monthly scenarios at standard pricing, no caching, no batch discount:
| Workload (per month) | Mistral Medium 3.5 | DeepSeek V4 Flash | You save |
|---|---|---|---|
| Coding agent loop (10M in / 2M out) | $30.00 | $1.96 | $28.04/mo |
| Long-document RAG (50M in / 5M out) | $112.50 | $8.40 | $104.10/mo |
| Internal chatbot (100M in / 20M out) | $300.00 | $19.60 | $280.40/mo |
| Bulk classification (200M in / 10M out) | $375.00 | $30.80 | $344.20/mo |
Cache-miss pricing on both. DeepSeek's automatic context cache typically cuts input costs by 50 to 80 percent on stable system prompts, which widens the gap further.
On a 3:1 input-to-output blend (the Artificial Analysis convention), Mistral Medium 3.5 costs $3.00 per million blended tokens. DeepSeek V4 Flash costs about 17 cents on the same blend. The translation: a thousand-dollar Mistral bill becomes a sixty-dollar DeepSeek bill at identical token volumes.
The cache-hit number that breaks the comparison
DeepSeek charges $0.0028 per million tokens on cache-hit input. That's two-tenths of a cent for a million-token prompt prefix. For RAG systems with a stable system prompt and reusable retrieval context, this single number is doing most of the work in the V4 Flash story.
We tested DeepSeek's cache on a fixed 8K-token system prompt across 50 calls and saw cache hits land within 200ms after the first call, with the cached portion billed at cache-hit rates as advertised. Compare $0.0028 to Mistral's $1.50 per million input tokens and you get a 535-fold gap on the cached portion of the prompt alone.
Mistral Medium 3.5 lists batch processing as a supported feature on the model card, but no separate batch or cache-hit tier appears in the public pricing docs. Build around DeepSeek's caching with hits in the 70 to 90 percent range, and your effective input cost drops to single-digit cents per million tokens. Skip caching, and the scenarios above are what you pay.
The benchmark surprise
Mistral published two benchmark numbers at launch. SWE-Bench Verified: 77.6%. τ³-Telecom agentic tool use: 91.4%. That's the entire benchmark disclosure for a frontier model release. No MMLU, no GPQA, no LiveCodeBench, no HumanEval, no AIME.
DeepSeek published ten. Here's the side-by-side, with the published-or-not status visible:
| Benchmark | Mistral Medium 3.5 | DeepSeek V4 Flash |
|---|---|---|
| SWE-Bench Verified | 77.6% | 79.0% |
| τ³-Telecom (agentic) | 91.4% | Not published |
| MMLU | Not published | 88.7% |
| MMLU-Pro | Not published | 86.2% |
| GPQA Diamond | Not published | 88.1% |
| LiveCodeBench (Pass@1) | Not published | 91.6 |
| Codeforces rating | Not published | 3052 |
| Terminal-Bench 2.0 | Not published | 56.9 |
| HumanEval (Pass@1) | Not published | 69.5% |
Mistral source: official release notes and the Vibe agents announcement. DeepSeek source: V4 Flash HuggingFace model card.
On the only benchmark where both reported numbers, the cheaper model wins by 1.4 points. That isn't a small thing for a tier comparison. SWE-Bench Verified is the most-watched agentic coding benchmark right now, and Mistral chose to publish it knowing where it landed.
The honest read is that we don't know how Mistral Medium 3.5 performs on MMLU-Pro, GPQA, or LiveCodeBench. Mistral chose not to publish those numbers, and that choice is information.
Dense vs MoE: what each architecture actually buys you
Mistral Medium 3.5 is a 128B dense model. Every token of inference activates every parameter. DeepSeek V4 Flash is a 284B mixture-of-experts model with 13B active per token, plus FP4+FP8 mixed precision and the new DeepSeek Sparse Attention.
For self-hosting, the dense model is easier to reason about. 128B dense fits on four high-end GPUs with vLLM and behaves predictably under load. 284B MoE needs more memory for the full weight set even though only 13B activate per token, and routing overhead adds operational complexity.
For API consumption, none of that matters to your bill. The MoE architecture is exactly why DeepSeek can charge $0.14 per million input tokens. Active parameter count drives inference cost, and DeepSeek is using fewer active parameters per token than Mistral despite having more total weights.
Both ship with open weights. Mistral has multimodal input (text + image), which DeepSeek V4 Flash does not. If your application needs vision, Mistral Medium 3.5 has something DeepSeek does not, full stop. Whether that's worth 17x is the question.
Where Medium 3.5 still earns its tag
Vision is the easy answer. DeepSeek V4 Flash is text-only, so any workload involving screenshots, diagrams, structured data extraction from images, or document understanding with embedded charts has exactly one open-weight option in this tier and it's Medium 3.5. There is no negotiation here. Mistral wins by capability, period.
Self-hosting is the second case. A 128B dense model behaves predictably on a fixed GPU budget and is operationally simpler than a 284B MoE that needs the full weight set in memory plus routing logic. Both ship open weights, so the comparison isn't about API access, it's about which model your platform team would rather run at 3 AM on a Saturday.
Agentic tool use is the third case, with caveats. Mistral published 91.4% on τ³-Telecom. DeepSeek didn't run that benchmark, and τ³-Telecom isn't SWE-Bench, so the comparison is asymmetric. If your application's tool-call patterns look like that benchmark's, Mistral has the only number on file.
Outside those three cases, DeepSeek V4 Flash gets the work done at a fraction of the cost, and ships with more published evidence behind it.
So which one ships in production?
Mistral Medium 3.5 isn't a bad model. 77.6% on SWE-Bench Verified is competitive with what frontier models scored a year ago, and the dense architecture and multimodal input are genuine differentiators. The problem is what's on the other side of the price tag: DeepSeek V4 Flash at a fraction of the cost, with a longer benchmark sheet and a higher SWE-Bench score on the only shared row.
For text-only workloads, default to V4 Flash and eval Medium 3.5 against it on your actual data before agreeing to the premium. For tasks that need vision, Medium 3.5 wins by default of capability. Anything in between depends on the two benchmark numbers Mistral chose not to publish, which is a tougher position to defend than it sounds.
One nuance worth flagging: Aider's leaderboard maintainer keeps a community-run polyglot benchmark that catches differences official model cards skip past. Watch that and Artificial Analysis as third-party numbers land for Medium 3.5 over the next two weeks before committing budget either way.
If you're evaluating both, our cost calculator takes input and output token volumes and gives you exact monthly numbers across all current model pricing.
Sources
- - Mistral Medium 3.5 model card: docs.mistral.ai
- - Mistral Vibe agents announcement (SWE-Bench score): mistral.ai/news
- - Mistral Medium 3.5 on Hugging Face: huggingface.co/mistralai
- - Mistral pricing on Artificial Analysis: artificialanalysis.ai
- - DeepSeek API pricing (official): api-docs.deepseek.com
- - DeepSeek V4 release notes: api-docs.deepseek.com/news/news260424
- - DeepSeek V4 Flash on Hugging Face: huggingface.co/deepseek-ai
- - Mistral Medium 3.5 launch coverage: marktechpost.com