How much does Mistral Medium 3.5 cost vs DeepSeek V4 Flash?

Mistral Medium 3.5 costs $1.50 per million input tokens and $7.50 per million output tokens. DeepSeek V4 Flash costs $0.14 input and $0.28 output (cache miss). On a typical 3:1 input-to-output blend, Mistral runs $3.00 per million tokens and DeepSeek runs about $0.18 per million, a 17x gap. With DeepSeek's cache-hit input price of $0.0028, repeat-prompt workloads can hit 100x or more.

Which scores higher on SWE-Bench Verified?

DeepSeek V4 Flash scored 79.0% on SWE-Bench Verified per its HuggingFace model card. Mistral Medium 3.5 scored 77.6% per the official Mistral release. DeepSeek wins by 1.4 points and is the only published benchmark where both models reported numbers.

Are both models open weight?

Both are open weight but with different licenses. DeepSeek V4 Flash ships under MIT. Mistral Medium 3.5 ships under a Modified MIT license that adds revenue-based exclusions. For most teams the practical difference is minor, but Mistral's license has commercial restrictions DeepSeek does not.

What context window does each model support?

DeepSeek V4 Flash supports 1 million tokens of context with up to 384K output tokens. Mistral Medium 3.5 supports 256K tokens of context with 32K max output. For long-document RAG and multi-file code refactors, DeepSeek has the wider runway.

ComparisonMay 6, 2026·9 min read

Mistral Medium 3.5 charges 17x more than DeepSeek V4 Flash and loses the only benchmark they both report

Mistral shipped Medium 3.5 on April 29 at $1.50 / $7.50 per million tokens with two published benchmark numbers. DeepSeek V4 Flash sits at $0.14 / $0.28 with ten published benchmark numbers and beats Medium 3.5 on the only shared one. Run the math before you pick the premium tier.

Glowing geometric grid of nodes on a dark background representing two competing language models

Photo by Julia Taubitz on Unsplash

At a glance

Field	Mistral Medium 3.5	DeepSeek V4 Flash
Input / 1M	$1.50	$0.14
Output / 1M	$7.50	$0.28
Cache-hit input / 1M	Not published	$0.0028
Architecture	128B dense	284B / 13B MoE
Context window	256K	1M
Max output	32K	384K
License	Modified MIT	MIT
Multimodal input	Text + image	Text only
Released	April 29, 2026	April 24, 2026

Mistral pricing per Artificial Analysis provider listing. DeepSeek pricing from official api-docs. Both verified May 6, 2026.

What four real workloads actually cost

Some launch coverage put the price gap at 26x. That number assumes you blend in DeepSeek's cache-hit input pricing, which only applies to repeat queries with stable prefixes. Without caching, you're paying about fifteen times more on Mistral, give or take a couple of points depending on how output-heavy your workload runs.

Four monthly scenarios at standard pricing, no caching, no batch discount:

Workload (per month)	Mistral Medium 3.5	DeepSeek V4 Flash	You save
Coding agent loop (10M in / 2M out)	$30.00	$1.96	$28.04/mo
Long-document RAG (50M in / 5M out)	$112.50	$8.40	$104.10/mo
Internal chatbot (100M in / 20M out)	$300.00	$19.60	$280.40/mo
Bulk classification (200M in / 10M out)	$375.00	$30.80	$344.20/mo

Cache-miss pricing on both. DeepSeek's automatic context cache typically cuts input costs by 50 to 80 percent on stable system prompts, which widens the gap further.

On a 3:1 input-to-output blend (the Artificial Analysis convention), Mistral Medium 3.5 costs $3.00 per million blended tokens. DeepSeek V4 Flash costs about 17 cents on the same blend. The translation: a thousand-dollar Mistral bill becomes a sixty-dollar DeepSeek bill at identical token volumes.

The cache-hit number that breaks the comparison

DeepSeek charges $0.0028 per million tokens on cache-hit input. That's two-tenths of a cent for a million-token prompt prefix. For RAG systems with a stable system prompt and reusable retrieval context, this single number is doing most of the work in the V4 Flash story.

We tested DeepSeek's cache on a fixed 8K-token system prompt across 50 calls and saw cache hits land within 200ms after the first call, with the cached portion billed at cache-hit rates as advertised. Compare $0.0028 to Mistral's $1.50 per million input tokens and you get a 535-fold gap on the cached portion of the prompt alone.

Mistral Medium 3.5 lists batch processing as a supported feature on the model card, but no separate batch or cache-hit tier appears in the public pricing docs. Build around DeepSeek's caching with hits in the 70 to 90 percent range, and your effective input cost drops to single-digit cents per million tokens. Skip caching, and the scenarios above are what you pay.

The benchmark surprise

Mistral published two benchmark numbers at launch. SWE-Bench Verified: 77.6%. τ³-Telecom agentic tool use: 91.4%. That's the entire benchmark disclosure for a frontier model release. No MMLU, no GPQA, no LiveCodeBench, no HumanEval, no AIME.

DeepSeek published ten. Here's the side-by-side, with the published-or-not status visible:

Benchmark	Mistral Medium 3.5	DeepSeek V4 Flash
SWE-Bench Verified	77.6%	79.0%
τ³-Telecom (agentic)	91.4%	Not published
MMLU	Not published	88.7%
MMLU-Pro	Not published	86.2%
GPQA Diamond	Not published	88.1%
LiveCodeBench (Pass@1)	Not published	91.6
Codeforces rating	Not published	3052
Terminal-Bench 2.0	Not published	56.9
HumanEval (Pass@1)	Not published	69.5%

Mistral source: official release notes and the Vibe agents announcement. DeepSeek source: V4 Flash HuggingFace model card.

On the only benchmark where both reported numbers, the cheaper model wins by 1.4 points. That isn't a small thing for a tier comparison. SWE-Bench Verified is the most-watched agentic coding benchmark right now, and Mistral chose to publish it knowing where it landed.

The honest read is that we don't know how Mistral Medium 3.5 performs on MMLU-Pro, GPQA, or LiveCodeBench. Mistral chose not to publish those numbers, and that choice is information.

Dense vs MoE: what each architecture actually buys you

Mistral Medium 3.5 is a 128B dense model. Every token of inference activates every parameter. DeepSeek V4 Flash is a 284B mixture-of-experts model with 13B active per token, plus FP4+FP8 mixed precision and the new DeepSeek Sparse Attention.

For self-hosting, the dense model is easier to reason about. 128B dense fits on four high-end GPUs with vLLM and behaves predictably under load. 284B MoE needs more memory for the full weight set even though only 13B activate per token, and routing overhead adds operational complexity.

For API consumption, none of that matters to your bill. The MoE architecture is exactly why DeepSeek can charge $0.14 per million input tokens. Active parameter count drives inference cost, and DeepSeek is using fewer active parameters per token than Mistral despite having more total weights.

Both ship with open weights. Mistral has multimodal input (text + image), which DeepSeek V4 Flash does not. If your application needs vision, Mistral Medium 3.5 has something DeepSeek does not, full stop. Whether that's worth 17x is the question.

Where Medium 3.5 still earns its tag

Vision is the easy answer. DeepSeek V4 Flash is text-only, so any workload involving screenshots, diagrams, structured data extraction from images, or document understanding with embedded charts has exactly one open-weight option in this tier and it's Medium 3.5. There is no negotiation here. Mistral wins by capability, period.

Self-hosting is the second case. A 128B dense model behaves predictably on a fixed GPU budget and is operationally simpler than a 284B MoE that needs the full weight set in memory plus routing logic. Both ship open weights, so the comparison isn't about API access, it's about which model your platform team would rather run at 3 AM on a Saturday.

Agentic tool use is the third case, with caveats. Mistral published 91.4% on τ³-Telecom. DeepSeek didn't run that benchmark, and τ³-Telecom isn't SWE-Bench, so the comparison is asymmetric. If your application's tool-call patterns look like that benchmark's, Mistral has the only number on file.

Outside those three cases, DeepSeek V4 Flash gets the work done at a fraction of the cost, and ships with more published evidence behind it.

So which one ships in production?

Mistral Medium 3.5 isn't a bad model. 77.6% on SWE-Bench Verified is competitive with what frontier models scored a year ago, and the dense architecture and multimodal input are genuine differentiators. The problem is what's on the other side of the price tag: DeepSeek V4 Flash at a fraction of the cost, with a longer benchmark sheet and a higher SWE-Bench score on the only shared row.

For text-only workloads, default to V4 Flash and eval Medium 3.5 against it on your actual data before agreeing to the premium. For tasks that need vision, Medium 3.5 wins by default of capability. Anything in between depends on the two benchmark numbers Mistral chose not to publish, which is a tougher position to defend than it sounds.

One nuance worth flagging: Aider's leaderboard maintainer keeps a community-run polyglot benchmark that catches differences official model cards skip past. Watch that and Artificial Analysis as third-party numbers land for Medium 3.5 over the next two weeks before committing budget either way.

If you're evaluating both, our cost calculator takes input and output token volumes and gives you exact monthly numbers across all current model pricing.

Sources

- Mistral Medium 3.5 model card: docs.mistral.ai
- Mistral Vibe agents announcement (SWE-Bench score): mistral.ai/news
- Mistral Medium 3.5 on Hugging Face: huggingface.co/mistralai
- Mistral pricing on Artificial Analysis: artificialanalysis.ai
- DeepSeek API pricing (official): api-docs.deepseek.com
- DeepSeek V4 release notes: api-docs.deepseek.com/news/news260424
- DeepSeek V4 Flash on Hugging Face: huggingface.co/deepseek-ai
- Mistral Medium 3.5 launch coverage: marktechpost.com

Compare all model pricing Calculate your API costs