Xiaomi MiMo-V2-Pro: the trillion-parameter model that fooled everyone into thinking it was DeepSeek
For a week, developers were convinced an anonymous model called "Hunter Alpha" was a stealth DeepSeek V4 leak. It wasn't. It was Xiaomi. With 1 trillion+ parameters, 78% SWE-bench, and $1 per million input tokens, MiMo-V2-Pro makes a strong case for the most underrated model release of March 2026.

Photo by BoliviaInteligente on Unsplash
TL;DR
- -Pricing: $1.00 / 1M input, $3.00 / 1M output (up to 256K context). 3x cheaper than Claude Sonnet 4.6 on input, 5x cheaper on output.
- -Architecture: 1 trillion+ total parameters, 42B active per token. Mixture-of-Experts with 7:1 hybrid attention. 1M context window.
- -Coding: 78.0% SWE-bench Verified. That's 3rd globally behind Claude Opus 4.6 (80.8%) and Sonnet 4.6 (79.6%).
- -Available on: OpenRouter (
xiaomi/mimo-v2-pro) and direct at platform.xiaomimimo.com. Free until March 25. - -Open source? No. The smaller MiMo-V2-Flash (309B/15B active) is Apache 2.0. V2-Pro weights are API-only for now.
The model that nobody could identify
On March 11, 2026, a model called "Hunter Alpha" appeared on OpenRouter with no company name, no press release, and a "stealth model" tag. Within days it was climbing the usage charts. Within a week it had processed over a trillion tokens. And practically everyone assumed it was DeepSeek.
The evidence felt compelling at the time. The model described itself as "a Chinese AI model primarily trained in Chinese." Its training data cutoff was May 2025 - identical to DeepSeek's reported cutoff. The architecture was MoE, which DeepSeek favors. And the person who would later be revealed as the lead developer, Luo Fuli, was a veteran of the DeepSeek R1 project. Every clue pointed to the same wrong answer.
On March 18, Xiaomi announced the full MiMo-V2 family and confirmed Hunter Alpha was an early build of MiMo-V2-Pro. Luo Fuli posted on X: "I call this a quiet ambush, not because we planned it, but because the shift from chat to agent paradigm happened so fast, even we barely believed it." The developers who had been using Hunter Alpha for a week got a good deal: it was free the entire time.
What Xiaomi actually built
MiMo-V2-Pro has over 1 trillion total parameters but only activates 42 billion per forward pass. The rest sit idle. That's the MoE deal: pay for routing, not for computation you don't use. The 7:1 hybrid attention ratio (dense attention applied to roughly 1 in 7 tokens, sparse on the rest) is what makes the 1M context window tractable without blowing out VRAM.
| Spec | Value |
|---|---|
| Total parameters | > 1 trillion |
| Active parameters per token | 42 billion |
| Architecture | Mixture-of-Experts |
| Attention | Hybrid (7:1 dense:sparse) |
| Context window | 1,048,576 tokens (1M) |
| Max output tokens | 131,072 |
| Knowledge cutoff | May 2025 |
| Release date | March 18, 2026 |
Xiaomi built MiMo specifically for agent workflows rather than chat. The predecessor, MiMo-V2-Flash (309B total / 15B active), is open source under Apache 2.0. V2-Pro is API-only for now, with Xiaomi saying weights will be released "once the model is stable." No timeline was given.
Pricing
Two tiers based on context length. Short contexts (up to 256K) are $1/$3 per million tokens. Longer contexts (up to 1M) are $2/$6. Most agentic coding tasks fit under 256K, so the lower tier is the relevant one.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| MiMo-V2-Pro (≤256K) | $1.00 | $3.00 | 256K |
| MiMo-V2-Pro (≤1M) | $2.00 | $6.00 | 1M |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| GPT-5.4 Mini | $0.75 | $4.50 | 400K |
| GPT-5.4 | $2.50 | $15.00 | 1M+ |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M |
| Claude Opus 4.6 | $5.00 | $25.00 | 200K |
Prices from OpenRouter and Xiaomi MiMo platform, retrieved March 23, 2026. Check our pricing page for the full list.
Benchmark scores
Xiaomi focused the V2-Pro benchmarks on coding and agentic performance rather than general reasoning. The SWE-bench and ClawEval results are from Xiaomi's official launch materials. The Artificial Analysis index scores are from third-party evaluation.
| Benchmark | MiMo-V2-Pro | Claude Opus 4.6 | Claude Sonnet 4.6 |
|---|---|---|---|
| SWE-bench Verified | 78.0% | 80.8% | 79.6% |
| ClawEval (agentic) | 61.5 | 66.3 | — |
| GPQA Diamond | 87.0% | — | — |
| HLE (Humanity's Last Exam) | 28.3% | — | — |
| IFBench (instruction follow) | 68.8% | — | — |
| AA Intelligence Index | 49 (top 10) | higher | ~49 |
| AA Agentic Index | 62.8 | higher | — |

Image credit: Xiaomi (via VentureBeat)
The SWE-bench gap is small: 78% vs Opus 4.6's 80.8% and Sonnet 4.6's 79.6%. The more interesting number is the agentic index - 62.8 puts it in the 98th percentile of all models on that benchmark. For a model that costs the same as Claude Haiku 4.5 on input, that's a meaningful gap in capability.
Note: Xiaomi did not publish MMLU or HumanEval scores for V2-Pro. Those appear in the V1 MiMo-7B and V2-Flash technical reports but not this release. ClawEval scores conflict between sources. VentureBeat's detailed article cites 61.5 for MiMo-V2-Pro vs 66.3 for Claude Opus 4.6. The Decoder cites 81.0 vs 81.5, which may reflect a different evaluation protocol or benchmark version. The table above uses VentureBeat's figures.
Speed
OpenRouter routes through Xiaomi's own endpoint (fp8 quantized). As of March 23, no third-party inference providers have been added.
29 tokens/second is slower than GPT-5.4 Mini and Gemini Flash on their fast-inference paths. For long agentic runs this matters less than for interactive chat. The 3.54 second TTFT is fine if you launch a task and come back for the result; it would feel slow in a streaming chat UI.
What this costs at scale
MiMo-V2-Pro is priced like a mid-tier model but performs closer to the frontier on coding tasks. Here's what that difference looks like in dollar terms for three real workloads.
3.75x cheaper vs Sonnet 4.6. Math: (100 * 50K/1M * $1 + 100 * 10K/1M * $3) * 30 = $240.
3.75x cheaper vs Sonnet 4.6. Math: (1000 * 15K/1M * $1 + 1000 * 3K/1M * $3) * 30 = $720.
3.75x cheaper vs Sonnet. Compares similarly with Claude Haiku 4.5 ($300/mo) - about 1.25x cheaper.
Run your own numbers with our cost calculator.
When MiMo-V2-Pro makes sense
- Agentic coding pipelines where you're currently paying Sonnet 4.6 rates
- SWE-bench-class tasks where Haiku-level quality isn't enough
- Long context tasks up to 1M tokens (the 256K+ tier at $2/$6 is still competitive)
- Workloads integrated with OpenClaw, Cline, or KiloCode (official support)
- Fast streaming chat (3.54s TTFT is noticeable)
- Tasks that need computer use or browser automation
- High-stakes coding where the 2-3% SWE-bench gap vs Opus matters
- Self-hosting requirements (no open weights for Pro yet)
- Budget pipelines competing against DeepSeek V3.2 ($0.28/M input)
The comparison that matters is against Claude Sonnet 4.6, not Haiku. They land within 2 points on SWE-bench, but MiMo is 3x cheaper on input and 5x cheaper on output. If you're running agentic coding infrastructure on Sonnet today, this is worth putting in your eval set.
The open source situation
MiMo-V2-Flash - the smaller version at 309B total / 15B active parameters - is fully open source under the MIT License. Weights are on Hugging Face at XiaomiMiMo. Community GGUF quantizations are already available from unsloth and bartowski if you want to run it locally.
V2-Pro is API-only. Xiaomi said weights will come "once the model is stable," which could mean weeks or months. Whether that happens - and when - is a guess. If self-hosting matters, use V2-Flash for now. It's a different model, but it's there.
Bottom line
MiMo-V2-Pro at $1 per million input tokens is hard to ignore for agentic coding work. You get within 3 points of Claude Opus 4.6 on SWE-bench at one-fifth the price. The Hunter Alpha story is a good hook, but the numbers are what matter here.
The limits are real: 29 tok/sec throughput is slow for interactive use, there are no open weights for V2-Pro, and Opus 4.6 still leads coding by a margin that will matter for harder tasks. But for pipelines where "nearly Opus-class coding at Haiku-class input prices" is the goal, this is worth testing before March 25 while access is still free.
Compare it against everything else on our pricing page, or plug in your workload with the cost calculator.
Sources
- Xiaomi MiMo: MiMo-V2-Pro product page (March 18, 2026)
- OpenRouter: xiaomi/mimo-v2-pro - live pricing and throughput
- Artificial Analysis: MiMo-V2-Pro Intelligence Index
- VentureBeat: "Xiaomi stuns with new MiMo-V2-Pro LLM" (March 18, 2026)
- The Decoder: Xiaomi MiMo launch coverage (March 18, 2026)
- Quasa.io: Hunter Alpha unmasked - timeline and reveal
- HuggingFace: XiaomiMiMo organization - V2-Flash open-source weights