How much does Mistral Small 4 cost?

Mistral Small 4 costs $0.15 per million input tokens and $0.60 per million output tokens via the Mistral API. It is also open source (Apache 2.0) and can be self-hosted.

Is Mistral Small 4 cheaper than GPT-5.4 Mini?

Yes. Mistral Small 4 is 5x cheaper on input ($0.15 vs $0.75) and 7.5x cheaper on output ($0.60 vs $4.50) compared to GPT-5.4 Mini.

Does Mistral Small 4 support images?

Yes. Mistral Small 4 is natively multimodal, accepting both text and image inputs. It supports OCR, document parsing, visual analysis, and annotation extraction.

What is the context window for Mistral Small 4?

Mistral Small 4 has a 256K token context window. This is smaller than GPT-5.4 Mini (400K) and Gemini models (1M) but sufficient for most document processing tasks.

Model ReleaseMarch 23, 2026·7 min read

Mistral Small 4: $0.15 per million input tokens for a multimodal MoE model

Mistral shipped a 119B-parameter model on March 16 that only activates 6B parameters per token. It handles text and images, does configurable reasoning, and costs 5x less than GPT-5.4 Mini on input. We dug into the pricing, the architecture, and whether the trade-offs make sense.

Image source: Mistral AI

TL;DR

-Pricing: $0.15 / 1M input, $0.60 / 1M output. That's 5x cheaper than GPT-5.4 Mini on input and 7.5x cheaper on output.
-Architecture: 119B total parameters, ~6.5B active per token. 128 experts, 4 active. Mixture of Experts. 256K context window.
-Multimodal: Native text + image input. Handles OCR, document parsing, visual analysis out of the box.
-Reasoning: Configurable via reasoning_effort parameter. Set to "none" for fast chat, "high" for step-by-step thinking.
-Open source: Apache 2.0 license. Self-host with 4x H100 GPUs or use the Mistral API.

What Mistral actually built

Mistral Small 4 is three models jammed into one. It merges Pixtral (their multimodal model), Magistral (reasoning), and Devstral (coding) into a single 119B-parameter MoE model. The API ID is mistral-small-2603.

Only 6.5B parameters activate on any given token. That's 128 total experts with 4 active at a time. The result: you get 119B-class capability at 6B-class inference costs. This is the same MoE trick NVIDIA used with Nemotron 3 Super, but Mistral pushes it harder with a 20:1 ratio of total to active parameters.

Mistral claims 40% faster end-to-end completion time and 3x more requests per second compared to Mistral Small 3. Those are throughput numbers, not quality numbers, but they matter if you're running this at scale.

Pricing breakdown

At $0.15 per million input tokens, Mistral Small 4 is the cheapest multimodal model from a major provider. The only models cheaper on input are Gemini 2.0 Flash-Lite ($0.075) and Mistral's own older Small 3.2 ($0.10), neither of which combines multimodal input with configurable reasoning.

Model	Input / 1M	Output / 1M	Context	Multimodal
Mistral Small 4	$0.15	$0.60	256K	Yes
GPT-5.4 Nano	$0.20	$1.25	400K	Yes
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M	Yes
DeepSeek V3.2	$0.28	$0.42	128K	No
GPT-5.4 Mini	$0.75	$4.50	400K	Yes
Claude Haiku 4.5	$1.00	$5.00	200K	Yes

Prices from Mistral docs and official provider pricing pages, retrieved March 23, 2026. Check our pricing page for the full list.

Why the MoE architecture matters for your bill

119B parameters sounds expensive to run. 6.5B active parameters doesn't. That gap is the whole point of Mixture of Experts. You get the knowledge of a large model with the inference cost of a small one.

Mistral runs 128 expert networks in parallel, but only routes each token through 4 of them. The routing layer picks which experts matter for each token. A coding token goes to coding experts. A French language token goes to language experts. The other 124 experts sit idle and cost nothing.

For self-hosting, that means you need 4x H100 GPUs minimum (the full model is ~242GB in BF16). Not cheap, but cheaper than hosting a 119B dense model, which would need 8-16 H100s. Most people will just use the API at $0.15/1M though.

The reasoning toggle

This is the part that caught our attention. Mistral Small 4 has a reasoning_effort parameter you can set per request.

reasoning_effort="none"

Fast chat mode. No chain-of-thought. Equivalent to Mistral Small 3.2 behavior. Use for classification, extraction, simple Q&A.

reasoning_effort="high"

Deep reasoning mode. Step-by-step thinking like Magistral. Use for math, complex analysis, multi-step problems. Costs more tokens but gets harder questions right.

This is useful because you don't pay for reasoning tokens on simple requests. A classification call with reasoning_effort="none" is pure speed. A math problem with reasoning_effort="high" takes longer but gets the answer right. Same model, same API, you just flip a parameter. OpenAI has a similar concept with their effort levels, but Mistral's is baked into a $0.15 model instead of a $2.50 one.

What the benchmarks say

Mistral published chart-based comparisons rather than clean tables for most benchmarks. Here's what we could extract. The GPQA Diamond and MMLU-Pro scores come from a third-party review, not Mistral's official numbers, so take them with appropriate skepticism.

Benchmark	Score	Source
AA LCR	0.72	Official
GPQA Diamond	71.2%	Third-party
MMLU-Pro	78.0%	Third-party
LiveCodeBench	> GPT-OSS 120B	Official (chart)
AIME 2025	> GPT-OSS 120B	Official (chart)

The LCR score of 0.72 is interesting because Mistral notes the model achieved it with only 1,600 characters of output, while comparable Qwen models needed 5,800-6,100 characters. That means lower output token costs for the same result. For a $0.60/1M output model, that efficiency compounds.

What this costs vs GPT-5.4 Mini

We ran the same workloads from our GPT-5.4 Mini vs Nano post through Mistral Small 4 pricing. The differences are large.

Customer support: 2K input + 500 output, 5,000/day

Mistral Small 4: $99/moGPT-5.4 Mini: $563/mo5.7x cheaper

Code review: 15K input + 3K output, 200/day

Mistral Small 4: $24/moGPT-5.4 Mini: $149/mo6.2x cheaper

Data extraction: 5K input + 1K output, 10,000/day

Mistral Small 4: $405/moGPT-5.4 Mini: $2,475/mo6.1x cheaper

Math: Customer support = [(10M * $0.15) + (2.5M * $0.60)] * 30 = ($1.50 + $1.50) * 30 = $90/mo. Rounded to $99 to account for reasoning token overhead. Run your own numbers with our cost calculator.

When Mistral Small 4 makes sense (and when it doesn't)

Good fit

High-volume document parsing with OCR
Budget multimodal pipelines (text + image)
Coding tasks where open source matters
Workloads that need both fast and deep modes
Self-hosted deployments (Apache 2.0)

Not a good fit

Computer use or browser automation (use GPT-5.4 Mini)
Tasks that need frontier-level reasoning (use GPT-5.4 or Claude)
Long context beyond 256K (GPT-5.4 Mini does 400K, Gemini does 1M)
Workloads where DeepSeek V3.2's $0.42 output beats $0.60

Honest take: if you don't need computer use or massive context windows, Mistral Small 4 undercuts basically everything at this quality level. The 256K context limit and lack of computer use support are the main reasons to pay more for GPT-5.4 Mini instead.

Bottom line

Mistral Small 4 at $0.15 per million input tokens is the cheapest way to get multimodal input and configurable reasoning in one model. The MoE architecture keeps inference fast despite the 119B parameter count. Apache 2.0 means you can self-host if you have the GPUs.

The limits are real: 256K context (not 400K or 1M), no computer use, and benchmark scores that sit below GPT-5.4 and Claude. But at 5-7x less than GPT-5.4 Mini, you're paying for a different tier of model and getting surprisingly close performance.

Compare it against everything else on our pricing page, or plug in your workload with the cost calculator.

Sources

Compare All Model Pricing Calculate Your API Costs