Skip to main content
TokenCost logoTokenCost
Model ReleaseApril 5, 2026·8 min read

Gemma 4 is out: $0.14 per million tokens for a 31B model scoring 89% on AIME

Google DeepMind dropped four open-weight models on April 2. The headline number: Gemma 4 31B scores 89.2% on AIME 2026 and 84.3% on GPQA Diamond, available on OpenRouter for $0.14 per million input tokens. Gemma 3 27B scored 20.8% on AIME. That's not a typo.

Close-up of a dark circuit board with teal lighting

Photo by Adi Goldstein on Unsplash

Four models, two architectures

Gemma 4 ships as four instruction-tuned models, all Apache 2.0 licensed. Two are dense transformers, two use a per-layer embedding trick that makes them smaller than their parameter count suggests. All four have native reasoning mode built in - you toggle it on, no separate model needed.

ModelTotal paramsActive paramsContextModalities
Gemma 4 31B
Dense
30.7B30.7B256KText, image, video
Gemma 4 26B A4B
MoE
25.2B3.8B256KText, image, video
Gemma 4 E4B
Dense
8B4.5B128KText, image, audio
Gemma 4 E2B
Dense
5.1B2.3B128KText, image, audio

"E" models use Per-Layer Embeddings for on-device efficiency. "A4B" = 3.8B active parameters per token in the MoE architecture. All models support native system prompts and function calling.

The split between the 31B and 26B MoE is the one worth paying attention to. The MoE activates only 3.8B parameters per forward pass but matches the 31B on most benchmarks. It runs at roughly the same speed as a 4B model while producing output quality that would have been frontier-class a year ago.

What it costs on OpenRouter

Gemma 4 is open-weight, so self-hosting costs depend on your hardware. For API access, OpenRouter has the 31B and 26B MoE live. The E4B and E2B aren't on providers yet. Google hasn't added Gemma 4 to its managed API (Vertex AI still lists Gemma 3 variants only).

ModelInput / 1MOutput / 1MContextProvider
Gemma 4 31B$0.14$0.40256KOpenRouter
Gemma 4 26B A4B$0.13$0.40256KOpenRouter
Gemma 3 27B IT$0.10$0.10128KOpenRouter
Llama 4 Scout$0.08$0.30512KOpenRouter
DeepSeek V3.2$0.28$0.42128KDeepSeek API
GPT-5.4 Nano$0.20$1.25400KOpenAI

Prices as of April 5, 2026. OpenRouter pricing varies by underlying provider. Gemma 3 27B and Llama 4 Scout included for context.

The MoE costing less than the dense model ($0.13 vs $0.14 input) despite similar benchmark scores is the cost story here. With only 3.8B active parameters, inference providers can fit more concurrent requests on the same GPU, and they pass that saving on. If you care about cost and are fine with a ~1 point benchmark sacrifice on AIME, the 26B MoE is the better pick.

Compared to other budget models: Gemma 4 26B MoE at $0.13 input is a penny more than Llama 4 Scout ($0.08) but with significantly higher reasoning scores. It costs less than half what DeepSeek V3.2 charges ($0.28). The gap against GPT-5.4 Nano ($0.20 input, $1.25 output) is less dramatic on input, but Nano's output cost is roughly triple.

Benchmark scores: what happened between Gemma 3 and 4

We don't normally get generational jumps like this in open models. The numbers below are all from instruction-tuned variants with reasoning mode enabled.

BenchmarkGemma 4 31B26B MoEGemma 3 27BJump
AIME 2026
Math competition
89.2%88.3%20.8%+68.4 pts
GPQA Diamond
Graduate science
84.3%82.3%42.4%+41.9 pts
MMLU Pro
Broad knowledge
85.2%82.6%67.6%+17.6 pts
LiveCodeBench v6
Code generation
80.0%77.1%29.1%+50.9 pts
Codeforces ELO
Competitive coding
21501718110+2040
MMMU Pro
Vision reasoning
76.9%73.8%49.7%+27.2 pts

Scores from Google DeepMind's Gemma page and HuggingFace model cards. All instruction-tuned with reasoning mode. Gemma 3 scores from Google's published benchmarks.

The Codeforces ELO jump from 110 to 2150 is the one that made people stop scrolling. ELO 2150 puts Gemma 4 31B in the Expert tier on Codeforces, up from below Newbie for Gemma 3. On the HuggingFace discussion page, one commenter put it well: the model didn't release, it escaped.

GPQA Diamond at 84.3% from a 31B open model is something to sit with. For reference, our reasoning models comparison showed DeepSeek R1 at 81.0% and o4-mini at 81.4% on the same benchmark. Gemma 4 beats both. The R1 costs $0.55/M input. o4-mini costs $1.10/M. Gemma 4 costs $0.14/M.

The MoE trailing by 1-3 points across the board is fine - arguably even ideal from a cost perspective. You lose almost nothing and gain inference speed. The exception is Codeforces ELO where the gap is wider (2150 vs 1718), which suggests the full dense model handles highly competitive coding problems better.

The MoE variant is the interesting one for production

Most of the social media attention went to the 31B's AIME score. But the 26B A4B is the model that will actually get deployed. It activates 3.8B parameters per token out of 25.2B total, which means it runs at roughly the throughput of a 4B-class model while producing results within 1-3 points of a 31B dense model.

At $0.13/M input on OpenRouter, it sits in a price bracket with models that score 40-60 points lower on most benchmarks. It costs roughly 70% less than Mistral Small 4 ($0.15/M input) on comparable benchmarks. And because the active parameter count is so low, it has better latency characteristics than any dense model at similar quality levels.

For teams self-hosting, the MoE is even more attractive. 3.8B active parameters means it can run on a single consumer GPU with quantization, while the 31B needs more substantial hardware. The HuggingFace page already has 42+ community quantizations posted for the 31B, and the MoE versions are following fast.

What's not there yet

No Google API pricing. Vertex AI and AI Studio both still list Gemma 3 variants. Given that the weights dropped on April 2 and OpenRouter had them up within a day, Google's managed API will probably follow soon, but right now you can't run Gemma 4 through Google's own infrastructure with official pricing.

The E4B and E2B (the smaller audio-capable models) aren't on any hosted provider yet. These are the ones with native audio input - up to 30 seconds of speech for ASR and translation. If that matters for your use case, you'll need to self-host for now.

Training data goes through January 2025. For a model released April 2, 2026, that's a 14-month knowledge gap. It won't know about models or pricing changes from 2025 onward without retrieval augmentation.

So what?

An open-weight model at $0.13-0.14/M input just posted benchmark scores that match or beat models costing 4-8x more. The 26B MoE variant in particular makes it hard to justify a lot of the mid-tier paid API pricing - unless you need features Gemma 4 doesn't have (like longer context, native computer use, or managed SLAs).

We've added both models to our pricing table. If you're currently paying for reasoning capability from a closed provider, run the numbers on whether Gemma 4 26B MoE gets you close enough for your workload. At a penny per ten thousand input tokens, the switching cost for testing is basically zero.

Sources

Compare all model pricesCalculate your API costs