Model ReleaseJune 9, 2026·8 min read

Xiaomi MiMo UltraSpeed charges 3x for 10x the speed. The catch is you can only rent it for two weeks.

Xiaomi turned on a trial today, June 9, for MiMo-V2.5-Pro UltraSpeed, a one-trillion-parameter model it claims runs past 1,000 tokens a second on ordinary eight-GPU hardware. The price is set against itself: triple the standard MiMo rate for roughly ten times the throughput. That makes it the rare release where the question is not “is it cheap” but “is the speed worth the markup,” and whether you can even get in. We work the numbers per million tokens, line it up against the other fast models you can actually pay for, and flag where the marketing runs ahead of what is confirmed.

Long-exposure light streaks on a dark background, evoking very high token throughput

Photo by Kalea Morgan on Unsplash

The short version

Trial pricing is roughly $1.31 input on a cache miss and $2.61 output per million tokens. Cache hits drop input to about a cent.
That is exactly 3x the standard MiMo-V2.5-Pro rate of $0.435 / $0.87, sold as the price of roughly 10x the speed.
Speed is listed at 500 to 1,000 tokens a second, with demos near 1,200. The 1,000 number is the headline, not the floor.
It is a two-week, application-gated trial (June 9 to June 23), not a model you can wire into production today.

Three numbers, and which one you actually pay

UltraSpeed has a split input price, which trips people up, so start there. A cache hit, where the prompt prefix is already in memory from a prior call, costs about $0.01 per million input tokens. A cache miss, the normal case for fresh prompts, costs about $1.31. Output is a flat $2.61. The honest figure to plan around is the cache-miss input, because most real traffic does not hit warm cache on every call.

Those dollar figures come from Xiaomi's native yuan rates (about ¥9 input, ¥18 output per million on a cache miss) converted at current rates, so treat the cents as approximate. What is exact is the ratio. The standard MiMo-V2.5-Pro bills ¥3 input and ¥6 output natively, which is $0.435 and $0.87 on OpenRouter. UltraSpeed is precisely three times each. Xiaomi is not pretending the speed is free; it is charging a clean 3x for it.

MiMo tier	Input / 1M	Output / 1M	Roughly how fast
UltraSpeed (trial)	~$1.31	~$2.61	500 to 1,000 tok/s
MiMo-V2.5-Pro standard	$0.435	$0.87	~10x slower
MiMo-V2-Pro (March)	$1.00	$3.00	Standard

Worth noting the cache-hit input price is the genuinely wild number. At a cent a million, a workload that replays a long shared system prompt across many calls pays almost nothing on the input side. That is the scenario Xiaomi clearly wants to show off. For everything else, budget for the cache-miss rate.

How fast is “fast,” in tokens a second

The headline is 1,000 tokens a second on a one-trillion-parameter model. Read the model page and it says 500 to 1,000, with demos peaking near 1,200. Both can be true; output speed swings with batch size and prompt shape. The fair way to hold it is a guaranteed floor near 500 and a ceiling near 1,000-plus, not a flat 1,000. Even the floor is a different category from the autoregressive flagships, which crawl by comparison.

Output speed, tokens per second (vendor and measured figures)

MiMo UltraSpeed

~1,000

Mercury 2

780

Gemini 3.5 Flash

184

GPT-5.5

Claude Opus 4.7

UltraSpeed is a vendor claim; Mercury 2 and Gemini 3.5 Flash are Artificial Analysis measurements; flagship speeds are typical observed rates. Cerebras-hosted open models run faster still (2,000-plus) but on wafer-scale custom silicon, not commodity GPUs.

The thing that makes the number interesting is not the raw figure, it is the hardware. Cerebras and Groq hit these speeds with custom chips. Xiaomi claims it got there on a single ordinary eight-GPU node using an inference engine called TileRT: FP4 quantization on the mixture-of-experts layers, a persistent kernel that kills per-operator launch overhead, and block-level speculative decoding. If that holds up under independent testing, the story is less “fastest model” and more “frontier speed without buying exotic silicon.”

The comparison that matters: speed against price

A model can be fast or cheap. The interesting ones are both. Here is UltraSpeed next to the other models you can pay for by the token that clear a few hundred tokens a second, with the output rate that actually governs streaming cost.

Model	Speed (tok/s)	Input / 1M	Output / 1M
MiMo UltraSpeed	500 to 1,000	~$1.31	~$2.61
Mercury 2	780	$0.25	$0.75
Gemini 3.5 Flash	184	$1.50	$9.00
MiMo-V2.5-Pro standard	~100	$0.435	$0.87

On pure speed per dollar, UltraSpeed loses. Mercury 2 is in the same speed class for a third of the output price, and Cerebras pushes open models past 2,000 tokens a second for around two dollars a million. So the honest pitch is narrow: UltraSpeed is for when you specifically want a frontier-class one-trillion-parameter model, with its coding and reasoning, running at this speed, and you would rather not stand up wafer-scale hardware to get there. Mercury 2 is a diffusion model tuned for throughput; UltraSpeed is a heavyweight made to sprint.

What a real month costs at the trial rate

Take a 70/30 input-output split and the cache-miss input price, since you cannot bank on warm cache across the board. Here is UltraSpeed against the standard tier so you can see exactly what the speed premium adds to the bill.

Monthly volume	UltraSpeed (trial)	MiMo standard	The speed tax
10M tokens	~$17	~$5.70	+$11
100M tokens	~$170	~$57	+$113
1B tokens	~$1,700	~$565	+$1,130

The speed tax is the whole decision. At a billion tokens a month you are paying about $1,130 extra for the faster tier. Whether that is worth it depends on what the latency buys you: a real-time coding assistant or a user-facing agent where waiting feels broken can justify it, while a nightly batch job that nobody watches almost never can. Run your own split through the calculator before you commit to the premium.

The fine print on getting in

Everything above assumes you can get in, and that is not a given. UltraSpeed is an application-gated trial running June 9 to June 23, with Xiaomi saying it will prioritize enterprises and professional developers with genuine workloads. There is no published general availability date and no word on what the permanent price will be once the trial ends. The 3x rate could be a launch discount or a launch markup; nobody outside Xiaomi knows yet.

A few more limits to plan around. Token Plan billing, the prepaid bundles Xiaomi sells for the standard models, does not apply to UltraSpeed during the trial. The free Chat trial caps you at ten queue entries a day, 30-minute sessions, and releases resources after about five idle minutes. So this is a window to benchmark the thing against your own workload, not a foundation to build production on this month. If the speed claims survive that benchmarking, the GA price is the number to watch for.

Is the quality actually there

UltraSpeed is the same base model as standard MiMo-V2.5-Pro, just quantized and accelerated, so its ceiling is whatever that model scores. On SWE-Bench Pro, the standard model posts 57.2%, which sits a hair under GPT-5.4's 57.7% and ahead of Claude Opus 4.6 at 53.4%, at a fraction of their output price. That is a genuinely strong coding result for an open-weight model, and it is why the speed angle matters: a fast model is only useful if it is also good.

The caveat is the FP4 quantization. Xiaomi says quality is “on par with the original,” but compressing the expert layers to four-bit precision is exactly the kind of change that can cost a point or two on hard tasks, and no third party has measured the accelerated build yet. The model launched yesterday. Until independent evaluations land, treat both the 1,000 tokens a second and the “no quality loss” as vendor claims that the trial exists to let you verify.

Rent it to benchmark, not to ship

If you are running a latency-sensitive agent or coding tool on a frontier-class model and the wait time is hurting the product, UltraSpeed is worth an application during the trial window. The pitch lands for exactly that case: you keep the one-trillion-parameter quality and roughly ten times the speed, on hardware your cloud already rents, for a 3x token bill you can model in advance. Benchmark it against your real prompts in the next two weeks while it is open.

If your workload is batch, or you mostly need cheap-and-fast rather than smart-and-fast, the premium is hard to justify. Mercury 2 covers throughput for a third of the price, the standard MiMo tier covers cost for a third of the speed, and either skips the trial gating entirely. Either way, the right move is to put the numbers side by side. The full pricing table has every fast model in one place, so you can match throughput to budget before you fill out a trial form.

Sources

Xiaomi: MiMo-V2.5-Pro UltraSpeed model intro - 500 to 1,000 tok/s, cache-hit / cache-miss / output pricing, trial details
Xiaomi MiMo + TileRT: past 1,000 tokens per second - FP4 quantization, persistent kernel, speculative decoding, eight-GPU node
Gizmochina: MiMo UltraSpeed at 1,000 tokens per second - June 9 to 23 trial window, 3x price for 10x speed, access limits
OpenRouter: MiMo-V2.5-Pro - standard tier $0.435 input / $0.87 output, 1M context, SWE-Bench Pro 57.2%
Artificial Analysis: Mercury 2 - measured 780 tok/s, $0.25 input / $0.75 output per 1M

Compare all model prices Calculate your API cost