Skip to main content
TokenCost logoTokenCost
Model ReleaseMarch 23, 2026·9 min read

Xiaomi MiMo-V2-Pro: the trillion-parameter model that fooled everyone into thinking it was DeepSeek

For a week, developers were convinced an anonymous model called "Hunter Alpha" was a stealth DeepSeek V4 leak. It wasn't. It was Xiaomi. With 1 trillion+ parameters, 78% SWE-bench, and $1 per million input tokens, MiMo-V2-Pro makes a strong case for the most underrated model release of March 2026.

Xiaomi MiMo-V2-Pro model announcement March 2026

Photo by BoliviaInteligente on Unsplash

TL;DR

  • -Pricing: $1.00 / 1M input, $3.00 / 1M output (up to 256K context). 3x cheaper than Claude Sonnet 4.6 on input, 5x cheaper on output.
  • -Architecture: 1 trillion+ total parameters, 42B active per token. Mixture-of-Experts with 7:1 hybrid attention. 1M context window.
  • -Coding: 78.0% SWE-bench Verified. That's 3rd globally behind Claude Opus 4.6 (80.8%) and Sonnet 4.6 (79.6%).
  • -Available on: OpenRouter (xiaomi/mimo-v2-pro) and direct at platform.xiaomimimo.com. Free until March 25.
  • -Open source? No. The smaller MiMo-V2-Flash (309B/15B active) is Apache 2.0. V2-Pro weights are API-only for now.

The model that nobody could identify

On March 11, 2026, a model called "Hunter Alpha" appeared on OpenRouter with no company name, no press release, and a "stealth model" tag. Within days it was climbing the usage charts. Within a week it had processed over a trillion tokens. And practically everyone assumed it was DeepSeek.

The evidence felt compelling at the time. The model described itself as "a Chinese AI model primarily trained in Chinese." Its training data cutoff was May 2025 - identical to DeepSeek's reported cutoff. The architecture was MoE, which DeepSeek favors. And the person who would later be revealed as the lead developer, Luo Fuli, was a veteran of the DeepSeek R1 project. Every clue pointed to the same wrong answer.

On March 18, Xiaomi announced the full MiMo-V2 family and confirmed Hunter Alpha was an early build of MiMo-V2-Pro. Luo Fuli posted on X: "I call this a quiet ambush, not because we planned it, but because the shift from chat to agent paradigm happened so fast, even we barely believed it." The developers who had been using Hunter Alpha for a week got a good deal: it was free the entire time.

What Xiaomi actually built

MiMo-V2-Pro has over 1 trillion total parameters but only activates 42 billion per forward pass. The rest sit idle. That's the MoE deal: pay for routing, not for computation you don't use. The 7:1 hybrid attention ratio (dense attention applied to roughly 1 in 7 tokens, sparse on the rest) is what makes the 1M context window tractable without blowing out VRAM.

SpecValue
Total parameters> 1 trillion
Active parameters per token42 billion
ArchitectureMixture-of-Experts
AttentionHybrid (7:1 dense:sparse)
Context window1,048,576 tokens (1M)
Max output tokens131,072
Knowledge cutoffMay 2025
Release dateMarch 18, 2026

Xiaomi built MiMo specifically for agent workflows rather than chat. The predecessor, MiMo-V2-Flash (309B total / 15B active), is open source under Apache 2.0. V2-Pro is API-only for now, with Xiaomi saying weights will be released "once the model is stable." No timeline was given.

Pricing

Two tiers based on context length. Short contexts (up to 256K) are $1/$3 per million tokens. Longer contexts (up to 1M) are $2/$6. Most agentic coding tasks fit under 256K, so the lower tier is the relevant one.

ModelInput / 1MOutput / 1MContext
MiMo-V2-Pro (≤256K)$1.00$3.00256K
MiMo-V2-Pro (≤1M)$2.00$6.001M
Claude Haiku 4.5$1.00$5.00200K
GPT-5.4 Mini$0.75$4.50400K
GPT-5.4$2.50$15.001M+
Claude Sonnet 4.6$3.00$15.00200K
Gemini 3.1 Pro$2.00$12.001M
Claude Opus 4.6$5.00$25.00200K

Prices from OpenRouter and Xiaomi MiMo platform, retrieved March 23, 2026. Check our pricing page for the full list.

Benchmark scores

Xiaomi focused the V2-Pro benchmarks on coding and agentic performance rather than general reasoning. The SWE-bench and ClawEval results are from Xiaomi's official launch materials. The Artificial Analysis index scores are from third-party evaluation.

BenchmarkMiMo-V2-ProClaude Opus 4.6Claude Sonnet 4.6
SWE-bench Verified78.0%80.8%79.6%
ClawEval (agentic)61.566.3
GPQA Diamond87.0%
HLE (Humanity's Last Exam)28.3%
IFBench (instruction follow)68.8%
AA Intelligence Index49 (top 10)higher~49
AA Agentic Index62.8higher
Official Xiaomi MiMo-V2-Pro benchmark comparison chart showing SWE-bench, ClawEval, and agentic scores vs Claude Opus 4.6 and GPT-5.2

Image credit: Xiaomi (via VentureBeat)

The SWE-bench gap is small: 78% vs Opus 4.6's 80.8% and Sonnet 4.6's 79.6%. The more interesting number is the agentic index - 62.8 puts it in the 98th percentile of all models on that benchmark. For a model that costs the same as Claude Haiku 4.5 on input, that's a meaningful gap in capability.

Note: Xiaomi did not publish MMLU or HumanEval scores for V2-Pro. Those appear in the V1 MiMo-7B and V2-Flash technical reports but not this release. ClawEval scores conflict between sources. VentureBeat's detailed article cites 61.5 for MiMo-V2-Pro vs 66.3 for Claude Opus 4.6. The Decoder cites 81.0 vs 81.5, which may reflect a different evaluation protocol or benchmark version. The table above uses VentureBeat's figures.

Speed

OpenRouter routes through Xiaomi's own endpoint (fp8 quantized). As of March 23, no third-party inference providers have been added.

29 tok/sec
Throughput
3.54 sec
Time to first token
100%
Uptime (launch week)

29 tokens/second is slower than GPT-5.4 Mini and Gemini Flash on their fast-inference paths. For long agentic runs this matters less than for interactive chat. The 3.54 second TTFT is fine if you launch a task and come back for the result; it would feel slow in a streaming chat UI.

What this costs at scale

MiMo-V2-Pro is priced like a mid-tier model but performs closer to the frontier on coding tasks. Here's what that difference looks like in dollar terms for three real workloads.

Agentic coding: 100 tasks/day, 50K input + 10K output each
MiMo-V2-Pro: $240/moClaude Sonnet 4.6: $900/mo

3.75x cheaper vs Sonnet 4.6. Math: (100 * 50K/1M * $1 + 100 * 10K/1M * $3) * 30 = $240.

Document analysis pipeline: 1,000 tasks/day, 15K input + 3K output
MiMo-V2-Pro: $720/moClaude Sonnet 4.6: $2,700/mo

3.75x cheaper vs Sonnet 4.6. Math: (1000 * 15K/1M * $1 + 1000 * 3K/1M * $3) * 30 = $720.

Code review: 200 tasks/day, 25K input + 5K output
MiMo-V2-Pro: $240/moClaude Sonnet 4.6: $900/mo

3.75x cheaper vs Sonnet. Compares similarly with Claude Haiku 4.5 ($300/mo) - about 1.25x cheaper.

Run your own numbers with our cost calculator.

When MiMo-V2-Pro makes sense

Good fit
  • Agentic coding pipelines where you're currently paying Sonnet 4.6 rates
  • SWE-bench-class tasks where Haiku-level quality isn't enough
  • Long context tasks up to 1M tokens (the 256K+ tier at $2/$6 is still competitive)
  • Workloads integrated with OpenClaw, Cline, or KiloCode (official support)
Not a good fit
  • Fast streaming chat (3.54s TTFT is noticeable)
  • Tasks that need computer use or browser automation
  • High-stakes coding where the 2-3% SWE-bench gap vs Opus matters
  • Self-hosting requirements (no open weights for Pro yet)
  • Budget pipelines competing against DeepSeek V3.2 ($0.28/M input)

The comparison that matters is against Claude Sonnet 4.6, not Haiku. They land within 2 points on SWE-bench, but MiMo is 3x cheaper on input and 5x cheaper on output. If you're running agentic coding infrastructure on Sonnet today, this is worth putting in your eval set.

The open source situation

MiMo-V2-Flash - the smaller version at 309B total / 15B active parameters - is fully open source under the MIT License. Weights are on Hugging Face at XiaomiMiMo. Community GGUF quantizations are already available from unsloth and bartowski if you want to run it locally.

V2-Pro is API-only. Xiaomi said weights will come "once the model is stable," which could mean weeks or months. Whether that happens - and when - is a guess. If self-hosting matters, use V2-Flash for now. It's a different model, but it's there.

Bottom line

MiMo-V2-Pro at $1 per million input tokens is hard to ignore for agentic coding work. You get within 3 points of Claude Opus 4.6 on SWE-bench at one-fifth the price. The Hunter Alpha story is a good hook, but the numbers are what matter here.

The limits are real: 29 tok/sec throughput is slow for interactive use, there are no open weights for V2-Pro, and Opus 4.6 still leads coding by a margin that will matter for harder tasks. But for pipelines where "nearly Opus-class coding at Haiku-class input prices" is the goal, this is worth testing before March 25 while access is still free.

Compare it against everything else on our pricing page, or plug in your workload with the cost calculator.

Sources