How much does Gemini 3.1 Flash-Lite cost?

Gemini 3.1 Flash-Lite costs $0.25 per million input tokens and $1.50 per million output tokens. The batch API halves these to $0.125 and $0.75. Context caching is $0.025 per million tokens.

Is Gemini 3.1 Flash-Lite better than Claude Haiku?

On most benchmarks, yes. Gemini 3.1 Flash-Lite scores 86.9% on GPQA Diamond versus Claude 4.5 Haiku at 73.0%, and 76.8% on MMMU-Pro versus 58.0%. It also costs 4x less on input ($0.25 vs $1.00 per million tokens).

What is Gemini 3.1 Flash-Lite context window?

Gemini 3.1 Flash-Lite has a 1,048,576 token (1M) context window with a maximum output of 65,536 tokens. Long-context recall degrades significantly at very long inputs.

Is Gemini 3.1 Flash-Lite available as a stable API?

As of March 2026, Gemini 3.1 Flash-Lite is preview only (model ID: gemini-3.1-flash-lite-preview). It may change before becoming stable and has more restrictive rate limits than GA models.

Model ReleaseMarch 23, 2026·8 min read

Gemini 3.1 Flash-Lite: $0.25 per million tokens, 1M context, and benchmark scores that beat Claude Haiku

Google shipped Gemini 3.1 Flash-Lite on March 3 as a preview model. It costs $0.25 per million input tokens, handles 1M token contexts, and scores 86.9% on GPQA Diamond - 14 points higher than Claude 4.5 Haiku, which costs 4x more. Here's the full breakdown.

Gemini 3.1 Flash-Lite model announcement from Google

Image source: Google Blog

TL;DR

-Pricing: $0.25 / 1M input, $1.50 / 1M output. Batch API halves both to $0.125 / $0.75.
-Context: 1,048,576 tokens (1M). 65,536 max output. Context caching at $0.025/1M.
-Benchmarks: 86.9% GPQA Diamond, 76.8% MMMU-Pro, 72.0% LiveCodeBench. Beats Claude 4.5 Haiku on all three.
-vs predecessor: 62% smarter (AA Intelligence Index: 34 vs 13), but 2.5x pricier on input and 3.75x on output.
-Status: Preview only. Model ID: gemini-3.1-flash-lite-preview. May change before GA.

What changed from Gemini 2.5 Flash-Lite

The previous Flash-Lite was a $0.10/$0.40 model with an AA Intelligence Index score of 13. The new one is $0.25/$1.50 with a score of 34. That's a 62% jump in measured intelligence at 2.5-3.75x the cost.

Whether that trade-off makes sense depends on what you're doing. If you were running high-volume classification with Gemini 2.5 Flash-Lite and it was working, benchmark before migrating. The old model was genuinely cheap. This one is a different product at a different price point.

Google claims 2.5x faster time to first answer token and 45% faster output speed versus the previous generation. Artificial Analysis puts throughput at approximately 239 tokens per second, which is solid for a preview model.

Pricing breakdown

At $0.25 input, Gemini 3.1 Flash-Lite sits between GPT-4o mini ($0.15) and GPT-5.4 Mini ($0.75). The relevant comparison isn't the cheapest tier though - it's Claude 4.5 Haiku at $1.00, which this model consistently outperforms on benchmarks at a quarter of the input cost.

Model	Input / 1M	Output / 1M	Context	GPQA
GPT-4o mini	$0.15	$0.60	128K	—
Mistral Small 4	$0.15	$0.60	256K	71.2%
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M	86.9%
Gemini 2.5 Flash	$0.30	$2.50	1M	82.8%
GPT-5.4 Mini	$0.75	$4.50	400K	82.3%
Claude 4.5 Haiku	$1.00	$5.00	200K	73.0%

Prices from Google AI pricing docs and official provider pages, retrieved March 23, 2026. GPQA Diamond from The Decoder / Artificial Analysis. Full model list on our pricing page.

Benchmark scores

Google didn't publish a clean table at launch. These numbers come from Artificial Analysis and The Decoder's coverage. Treat them as directionally accurate rather than verified official scores.

Benchmark	3.1 Flash-Lite	2.5 Flash-Lite	Claude Haiku 4.5	GPT-5 mini
GPQA Diamond	86.9%	66.7%	73.0%	82.3%
MMMU-Pro	76.8%	51.0%	58.0%	74.1%
LiveCodeBench	72.0%	34.3%	53.2%	80.4%
MMMLU (multilingual)	88.9%	84.5%	83.0%	84.9%
CharXiv Reasoning	73.2%	55.5%	61.7%	75.5%
Video-MMMU	84.8%	60.7%	—	82.5%
AA Intelligence Index	34	13	37	—

The GPQA result is the one that stands out. GPQA Diamond tests graduate-level science reasoning - physics, chemistry, biology questions written by PhD students. Flash-Lite at 86.9% beats Claude 4.5 Haiku at 73.0% by a margin that matters, and costs a quarter as much on input. The coding benchmark (LiveCodeBench) is where it falls short - GPT-5 mini at 80.4% is noticeably better.

The 1M context window

Claude 4.5 Haiku tops out at 200K tokens. GPT-4o mini at 128K. Gemini 3.1 Flash-Lite does 1M - roughly 750,000 words, or about 8 novels. For most workloads that's more than you'll ever use. For some, it's the deciding factor.

The catch: long-context recall degrades. MRCR benchmark scores show 60.1% at 128K dropping to 12.3% at 1M tokens. That last number is low enough that you should not rely on it for needle-in-a-haystack retrieval at very long contexts. Context caching at $0.025/1M tokens is a better tool for that.

Where 1M context does help: passing large codebases as context, batching many documents into a single request to reduce overhead, or use cases where the model only needs to partially process a large in-context reference.

Real cost scenarios

Three common workloads compared against Claude 4.5 Haiku, which is the natural alternative given the benchmark overlap:

Customer support: 2K input + 500 output, 5,000 requests/day

Gemini 3.1 Flash-Lite: $187/moClaude 4.5 Haiku: $675/mo3.6x cheaper

Content moderation: 1K input + 200 output, 50,000 requests/day

Gemini 3.1 Flash-Lite: $870/moClaude 4.5 Haiku: $3,150/mo3.6x cheaper

Document summarization: 10K input + 2K output, 500 requests/day

Gemini 3.1 Flash-Lite: $82/moClaude 4.5 Haiku: $300/mo3.7x cheaper

Basic estimates with no caching. Context caching and batch API can cut these further. Run your numbers with our cost calculator.

Watch the output cost

$1.50/1M output sounds fine until you compare it to Mistral Small 4 at $0.60/1M or GPT-4o mini at $0.60/1M. For output-heavy workloads - writing, code generation, long summaries - that 2.5x gap adds up.

Example: 1,000 daily requests generating 2,000 output tokens each. Monthly output tokens: 60M. Gemini 3.1 Flash-Lite: $90/month. Mistral Small 4: $36/month. The benchmark advantage needs to be worth the extra $54/month for that workload. For input-heavy pipelines where outputs are short, the math flips and Flash-Lite wins on cost.

When it makes sense, and when it doesn't

Good fit

High-volume classification with short outputs
Multimodal pipelines: text, images, video, audio, PDFs in one call
Workloads needing 200K-1M context at sub-$0.30 input cost
Science and reasoning tasks where GPQA performance matters
Translation and multilingual processing at scale
Batch jobs where the 50% batch discount applies

Not a good fit

Output-heavy workloads (Mistral Small 4 is 2.5x cheaper on output)
Precision coding tasks (LiveCodeBench: 72% vs 80% for GPT-5 mini)
Production workloads that can't tolerate preview-model changes
Real-time audio/video streaming (Live API not supported)
Reliable needle-in-a-haystack retrieval at very long context

The preview status is worth taking seriously. Google increased Flash-Lite pricing significantly between generations ($0.10 to $0.25 on input). Build in the assumption that GA pricing may differ from what's showing now.

Bottom line

Gemini 3.1 Flash-Lite is a real step up. 62% better on the AA Intelligence Index is not a rounding error, and the GPQA gap over Claude 4.5 Haiku (86.9% vs 73%) is large enough to matter for reasoning-heavy workloads. At $0.25 input vs Haiku's $1.00, you get better benchmark performance for a quarter of the cost.

The limits are real too: output costs more than the cheapest alternatives, coding performance lags GPT-5 mini, and the model is still preview. For input-heavy classification, extraction, and multimodal pipelines where outputs are concise, it's probably the best option in the sub-$0.30 input tier right now.

Compare it against every model on our pricing page, or run your specific workload numbers with the cost calculator.

Sources

Compare All Model Pricing Calculate Your API Costs