Gemini 3.1 Flash-Lite: $0.25 per million tokens, 1M context, and benchmark scores that beat Claude Haiku
Google shipped Gemini 3.1 Flash-Lite on March 3 as a preview model. It costs $0.25 per million input tokens, handles 1M token contexts, and scores 86.9% on GPQA Diamond - 14 points higher than Claude 4.5 Haiku, which costs 4x more. Here's the full breakdown.

Image source: Google Blog
TL;DR
- -Pricing: $0.25 / 1M input, $1.50 / 1M output. Batch API halves both to $0.125 / $0.75.
- -Context: 1,048,576 tokens (1M). 65,536 max output. Context caching at $0.025/1M.
- -Benchmarks: 86.9% GPQA Diamond, 76.8% MMMU-Pro, 72.0% LiveCodeBench. Beats Claude 4.5 Haiku on all three.
- -vs predecessor: 62% smarter (AA Intelligence Index: 34 vs 13), but 2.5x pricier on input and 3.75x on output.
- -Status: Preview only. Model ID:
gemini-3.1-flash-lite-preview. May change before GA.
What changed from Gemini 2.5 Flash-Lite
The previous Flash-Lite was a $0.10/$0.40 model with an AA Intelligence Index score of 13. The new one is $0.25/$1.50 with a score of 34. That's a 62% jump in measured intelligence at 2.5-3.75x the cost.
Whether that trade-off makes sense depends on what you're doing. If you were running high-volume classification with Gemini 2.5 Flash-Lite and it was working, benchmark before migrating. The old model was genuinely cheap. This one is a different product at a different price point.
Google claims 2.5x faster time to first answer token and 45% faster output speed versus the previous generation. Artificial Analysis puts throughput at approximately 239 tokens per second, which is solid for a preview model.
Pricing breakdown
At $0.25 input, Gemini 3.1 Flash-Lite sits between GPT-4o mini ($0.15) and GPT-5.4 Mini ($0.75). The relevant comparison isn't the cheapest tier though - it's Claude 4.5 Haiku at $1.00, which this model consistently outperforms on benchmarks at a quarter of the input cost.
| Model | Input / 1M | Output / 1M | Context | GPQA |
|---|---|---|---|---|
| GPT-4o mini | $0.15 | $0.60 | 128K | — |
| Mistral Small 4 | $0.15 | $0.60 | 256K | 71.2% |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | 86.9% |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | 82.8% |
| GPT-5.4 Mini | $0.75 | $4.50 | 400K | 82.3% |
| Claude 4.5 Haiku | $1.00 | $5.00 | 200K | 73.0% |
Prices from Google AI pricing docs and official provider pages, retrieved March 23, 2026. GPQA Diamond from The Decoder / Artificial Analysis. Full model list on our pricing page.
Benchmark scores
Google didn't publish a clean table at launch. These numbers come from Artificial Analysis and The Decoder's coverage. Treat them as directionally accurate rather than verified official scores.
| Benchmark | 3.1 Flash-Lite | 2.5 Flash-Lite | Claude Haiku 4.5 | GPT-5 mini |
|---|---|---|---|---|
| GPQA Diamond | 86.9% | 66.7% | 73.0% | 82.3% |
| MMMU-Pro | 76.8% | 51.0% | 58.0% | 74.1% |
| LiveCodeBench | 72.0% | 34.3% | 53.2% | 80.4% |
| MMMLU (multilingual) | 88.9% | 84.5% | 83.0% | 84.9% |
| CharXiv Reasoning | 73.2% | 55.5% | 61.7% | 75.5% |
| Video-MMMU | 84.8% | 60.7% | — | 82.5% |
| AA Intelligence Index | 34 | 13 | 37 | — |
The GPQA result is the one that stands out. GPQA Diamond tests graduate-level science reasoning - physics, chemistry, biology questions written by PhD students. Flash-Lite at 86.9% beats Claude 4.5 Haiku at 73.0% by a margin that matters, and costs a quarter as much on input. The coding benchmark (LiveCodeBench) is where it falls short - GPT-5 mini at 80.4% is noticeably better.
The 1M context window
Claude 4.5 Haiku tops out at 200K tokens. GPT-4o mini at 128K. Gemini 3.1 Flash-Lite does 1M - roughly 750,000 words, or about 8 novels. For most workloads that's more than you'll ever use. For some, it's the deciding factor.
The catch: long-context recall degrades. MRCR benchmark scores show 60.1% at 128K dropping to 12.3% at 1M tokens. That last number is low enough that you should not rely on it for needle-in-a-haystack retrieval at very long contexts. Context caching at $0.025/1M tokens is a better tool for that.
Where 1M context does help: passing large codebases as context, batching many documents into a single request to reduce overhead, or use cases where the model only needs to partially process a large in-context reference.
Real cost scenarios
Three common workloads compared against Claude 4.5 Haiku, which is the natural alternative given the benchmark overlap:
Basic estimates with no caching. Context caching and batch API can cut these further. Run your numbers with our cost calculator.
Watch the output cost
$1.50/1M output sounds fine until you compare it to Mistral Small 4 at $0.60/1M or GPT-4o mini at $0.60/1M. For output-heavy workloads - writing, code generation, long summaries - that 2.5x gap adds up.
Example: 1,000 daily requests generating 2,000 output tokens each. Monthly output tokens: 60M. Gemini 3.1 Flash-Lite: $90/month. Mistral Small 4: $36/month. The benchmark advantage needs to be worth the extra $54/month for that workload. For input-heavy pipelines where outputs are short, the math flips and Flash-Lite wins on cost.
When it makes sense, and when it doesn't
- High-volume classification with short outputs
- Multimodal pipelines: text, images, video, audio, PDFs in one call
- Workloads needing 200K-1M context at sub-$0.30 input cost
- Science and reasoning tasks where GPQA performance matters
- Translation and multilingual processing at scale
- Batch jobs where the 50% batch discount applies
- Output-heavy workloads (Mistral Small 4 is 2.5x cheaper on output)
- Precision coding tasks (LiveCodeBench: 72% vs 80% for GPT-5 mini)
- Production workloads that can't tolerate preview-model changes
- Real-time audio/video streaming (Live API not supported)
- Reliable needle-in-a-haystack retrieval at very long context
The preview status is worth taking seriously. Google increased Flash-Lite pricing significantly between generations ($0.10 to $0.25 on input). Build in the assumption that GA pricing may differ from what's showing now.
Bottom line
Gemini 3.1 Flash-Lite is a real step up. 62% better on the AA Intelligence Index is not a rounding error, and the GPQA gap over Claude 4.5 Haiku (86.9% vs 73%) is large enough to matter for reasoning-heavy workloads. At $0.25 input vs Haiku's $1.00, you get better benchmark performance for a quarter of the cost.
The limits are real too: output costs more than the cheapest alternatives, coding performance lags GPT-5 mini, and the model is still preview. For input-heavy classification, extraction, and multimodal pipelines where outputs are concise, it's probably the best option in the sub-$0.30 input tier right now.
Compare it against every model on our pricing page, or run your specific workload numbers with the cost calculator.
Sources
- Google Blog: Introducing Gemini 3.1 Flash-Lite (March 3, 2026)
- Google AI Developer Docs: Gemini 3.1 Flash-Lite model card
- Google AI pricing
- The Decoder: Gemini 3.1 Flash-Lite benchmarks and pricing analysis (March 3, 2026)
- Artificial Analysis: Gemini 3.1 Flash-Lite speed and benchmark data
- Anthropic: Claude API pricing