Model ReleaseJune 6, 2026·8 min read

Microsoft's MAI-Code-1-Flash matches GPT-5.4 Mini to the cent, then claims a third fewer tokens. You still can't call it through an API.

Microsoft shipped MAI-Code-1-Flash at Build on June 2, a 137B sparse-MoE coding model that GitHub lists at $0.75 input and $4.50 output per million tokens, the exact rate OpenAI charges for GPT-5.4 Mini. The pitch is that it solves harder coding tasks with up to 60% fewer tokens than Claude Haiku 4.5, which is the number that actually moves a bill. Three things to weigh before you get excited: the price lives on GitHub's pricing page while the model card still says it is unfinished, the benchmark win over Haiku only holds in Microsoft's own test harness, and the model runs nowhere except inside Copilot in VS Code.

Blue light streaks blurring across a dark background, representing the speed of MAI-Code-1-Flash

Photo by Taiga Miyamoto on Unsplash

The price you can see and the price Microsoft will admit to

Start with the awkward part. The $0.75/$4.50 rate everyone is quoting comes off GitHub's Copilot pricing page. Microsoft's own model card, published the same day, lists pricing as “to be finalized.” Both are official, and they disagree. That is not a rounding problem, it tells you what MAI-Code-1-Flash is right now: a token-billed model inside a product, not a model you provision against a rate card. There is no API to call, so the per-million number is what Copilot meters, not what you wire into your own stack.

Hold that next to the benchmark claim. Microsoft says MAI beats Claude Haiku 4.5 on coding, and in its numbers it does. Anthropic's own published Haiku score is higher than MAI's. Both can be true, because they ran different harnesses. The honest read is that the sticker rate and the leaderboard win are each softer than the headline, and the one claim that survives the scrutiny is the one about tokens.

Where the rate lands in the small-model bracket

The cheap coding tier got crowded this spring, and MAI walks in at a price that is not new. It is the same $0.75/$4.50 OpenAI set for GPT-5.4 Mini, down to the cached-input discount. Here is the bracket, sorted by input price.

Model	Input / 1M	Cached / 1M	Output / 1M	Context
GPT-5.1-codex-mini	$0.25	$0.025	$2.00	272K
MAI-Code-1-Flash	$0.75	$0.075	$4.50	256K
GPT-5.4 Mini	$0.75	$0.075	$4.50	272K
Claude Haiku 4.5	$1.00	$0.10	$5.00	200K
Gemini 3.5 Flash	$1.50	$0.15	$9.00	1M

On sticker price alone MAI is unremarkable. GPT-5.1-codex-mini undercuts it by two thirds on input, GPT-5.4 Mini ties it exactly, and only Haiku and Gemini Flash sit above. If Microsoft were selling raw tokens this would be a non-event. It is not selling raw tokens. It is selling a model that claims to need fewer of them, which is where the math gets interesting.

The token count is the actual product

Microsoft published its coding scores next to the average tokens each task burned, and that second column is the one worth reading. Against Haiku 4.5 in the same harness, MAI closed tasks with meaningfully less spend almost everywhere. A pass rate tells you whether the work got done. The token count tells you what it cost to get there.

Benchmark	MAI pass / tokens	Haiku 4.5 pass / tokens
SWE-Bench Verified	71.6% / 10.8K	66.6% / 15.3K
SWE-Bench Pro	51.2% / 28.0K	35.2% / 41.6K
SWE-Bench Multilingual	65.5% / 21.6K	62.7% / 27.3K
Terminal Bench 2	54.8% / 28.0K	27.3% / 25.0K

On SWE-Bench Verified that is 10.8K tokens against 15.3K, roughly 29% fewer, while scoring five points higher. On SWE-Bench Pro the gap widens to a third fewer tokens for a 16-point lead. Terminal Bench is the honest exception: MAI burned slightly more there (28K against 25K) but doubled the pass rate, so the cost-per-completed-task still favors it. This is where Microsoft's “up to 60% fewer tokens” line comes from. Read it as a best case, not an average, and remember every cell here was measured by the vendor.

Whose harness, and why it changes the answer

The benchmark table above runs every model through Microsoft's GitHub Copilot harness, including the competitors. That is a legitimate way to keep conditions equal, but it produces numbers that do not match what the other labs publish. The clearest case is Haiku 4.5 on SWE-Bench Verified.

SWE-Bench Verified, Haiku 4.5	Score	Verdict vs MAI (71.6%)
Microsoft's Copilot harness	66.6%	MAI wins by 5 points
Anthropic's published figure	73.3%	Haiku wins by 1.7 points

Flip the source and the headline flips with it. We are not saying Microsoft cooked the number. Different scaffolding, retry budgets, and thinking allowances move SWE-Bench by several points routinely, which is exactly why a single vendor-run chart cannot settle a cross-lab ranking. Until someone neutral runs MAI, treat “beats Haiku” as true-in-Copilot and unproven everywhere else. The token-efficiency gap is the sturdier claim, because both models faced the same harness when it was measured.

What a workload actually costs

Take the rates at face value first, before any efficiency adjustment. Three workloads at list price, dollars rounded. Because MAI and GPT-5.4 Mini share a rate card, their columns are identical by construction, which is the point.

Workload	MAI / 5.4 Mini	Haiku 4.5	Gemini 3.5 Flash	codex-mini
Quick fix (10K in / 3K out)	$0.02	$0.03	$0.04	$0.01
Single task (50K in / 15K out)	$0.11	$0.13	$0.21	$0.04
Agent session (200K in / 40K out)	$0.33	$0.40	$0.66	$0.13
300M tokens/mo (70/30 in/out)	$563	$660	$1,125	$233

At equal token counts MAI saves you about 15% against Haiku and roughly half against Gemini Flash, while codex-mini stays the cheapest sticker in the room. Now layer the efficiency claim on top. If MAI really runs a coding task in 29% fewer tokens than Haiku, its effective per-task cost drops below codex-mini's on the same job, because codex-mini is cheap per token but nobody has shown it finishes in fewer of them. That is the whole bet: not the rate, the count. It only pays off if the efficiency holds outside Microsoft's harness, which is unproven.

The part that caps the upside: you can't actually deploy it

Everything above assumes you can point a workload at MAI-Code-1-Flash. For now you cannot. It runs only inside GitHub Copilot in Visual Studio Code, rolling out to a slice of users on the Free, Student, Pro, Pro+, and Max plans. There is no public endpoint, no Azure AI Foundry listing, and no confirmed third-party host, whatever some secondary write-ups claim. A Copilot CLI build is on the roadmap for a later phase. The model card is explicit that an API release, if it happens, would ship with its own docs, which is corporate for “not yet.”

So the $0.75/$4.50 number is real in the sense that Copilot will meter your usage against it, and theoretical in the sense that you cannot build on it. If you live in VS Code and Copilot, MAI is a free swap in the model picker that may quietly lower the token bill your employer pays. If you are an API team comparing it against GPT-5.4 Mini or codex-mini for a production agent, there is nothing to integrate against today. That gap, not the price, is the real story of this launch.

What 137B total and 5B active buys

The architecture explains the efficiency pitch. MAI-Code-1-Flash is a sparse Mixture-of-Experts transformer with 137 billion total parameters but only 5 billion active per token, trained from a mid-training checkpoint of Microsoft's MAI-Thinking-1 reasoning model. The 5B figure that floated through some headlines is the active count, not a tiny model. A sparse 5B-active path is what lets it serve fast and cheap while carrying the knowledge of a much larger network, and a coding model fine-tuned off a reasoning base is a reasonable way to get the careful, fewer-tokens behavior the benchmarks show. Context tops out at 256K, below GPT-5.4 Mini's 272K and well under Gemini Flash's million, which matters for whole-repo passes but not for the task-sized agent loops this model is built for.

Worth a free swap, not a migration plan

If you are already in Copilot, there is no reason not to try MAI-Code-1-Flash on your next ticket. It costs you nothing extra to pick from the model menu, the token-efficiency evidence is the most credible part of the launch, and on Microsoft's own coding tasks it cleared more work with less spend than Haiku 4.5. That is a real, if narrow, result.

What you cannot do yet is plan around it. The rate is on a pricing page the model card contradicts, the benchmark lead evaporates the moment you switch to Anthropic's own Haiku number, and there is no API to build a service on. A team weighing GPT-5.4 Mini against codex-mini for a production agent has two shipping options and one VS Code demo. The smart move is to watch for the API release and the first neutral SWE-Bench run, and let those two events decide whether the fewer-tokens story holds outside the harness that produced it.

Line MAI's rate card up against the models you can actually call on the full pricing table, or run your own coding token mix through the calculator to see where the efficiency would have to land to matter.

Sources

Microsoft: Introducing MAI-Code-1-Flash - Launch post, “up to 60% fewer tokens” claim
Microsoft: MAI-Code-1-Flash model card - 256K context, 137B/5B sparse MoE, benchmark tables, “pricing to be finalized,” VS Code-only availability
GitHub Changelog: MAI-Code-1-Flash in Copilot - Plan availability and June 2 rollout date
Implicator: Copilot rollout and pricing - $0.75 / $0.075 / $4.50 per 1M off GitHub's pricing page, 137B-vs-5B card discrepancy
Anthropic: Claude Haiku 4.5 - Official 73.3% SWE-Bench Verified figure and $1.00/$5.00 pricing
OpenAI: GPT-5.4 Mini - $0.75 / $0.075 / $4.50 per 1M, 272K context

Compare all model prices Calculate your API cost