Skip to main content
TokenCost logoTokenCost
Model ReleaseMarch 23, 2026·8 min read

Gemini 3.1 Flash-Lite: $0.25 per million tokens, 1M context, and benchmark scores that beat Claude Haiku

Google shipped Gemini 3.1 Flash-Lite on March 3 as a preview model. It costs $0.25 per million input tokens, handles 1M token contexts, and scores 86.9% on GPQA Diamond - 14 points higher than Claude 4.5 Haiku, which costs 4x more. Here's the full breakdown.

Gemini 3.1 Flash-Lite model announcement from Google

Image source: Google Blog

TL;DR

  • -Pricing: $0.25 / 1M input, $1.50 / 1M output. Batch API halves both to $0.125 / $0.75.
  • -Context: 1,048,576 tokens (1M). 65,536 max output. Context caching at $0.025/1M.
  • -Benchmarks: 86.9% GPQA Diamond, 76.8% MMMU-Pro, 72.0% LiveCodeBench. Beats Claude 4.5 Haiku on all three.
  • -vs predecessor: 62% smarter (AA Intelligence Index: 34 vs 13), but 2.5x pricier on input and 3.75x on output.
  • -Status: Preview only. Model ID: gemini-3.1-flash-lite-preview. May change before GA.

What changed from Gemini 2.5 Flash-Lite

The previous Flash-Lite was a $0.10/$0.40 model with an AA Intelligence Index score of 13. The new one is $0.25/$1.50 with a score of 34. That's a 62% jump in measured intelligence at 2.5-3.75x the cost.

Whether that trade-off makes sense depends on what you're doing. If you were running high-volume classification with Gemini 2.5 Flash-Lite and it was working, benchmark before migrating. The old model was genuinely cheap. This one is a different product at a different price point.

Google claims 2.5x faster time to first answer token and 45% faster output speed versus the previous generation. Artificial Analysis puts throughput at approximately 239 tokens per second, which is solid for a preview model.

Pricing breakdown

At $0.25 input, Gemini 3.1 Flash-Lite sits between GPT-4o mini ($0.15) and GPT-5.4 Mini ($0.75). The relevant comparison isn't the cheapest tier though - it's Claude 4.5 Haiku at $1.00, which this model consistently outperforms on benchmarks at a quarter of the input cost.

ModelInput / 1MOutput / 1MContextGPQA
GPT-4o mini$0.15$0.60128K
Mistral Small 4$0.15$0.60256K71.2%
Gemini 3.1 Flash-Lite$0.25$1.501M86.9%
Gemini 2.5 Flash$0.30$2.501M82.8%
GPT-5.4 Mini$0.75$4.50400K82.3%
Claude 4.5 Haiku$1.00$5.00200K73.0%

Prices from Google AI pricing docs and official provider pages, retrieved March 23, 2026. GPQA Diamond from The Decoder / Artificial Analysis. Full model list on our pricing page.

Benchmark scores

Google didn't publish a clean table at launch. These numbers come from Artificial Analysis and The Decoder's coverage. Treat them as directionally accurate rather than verified official scores.

Benchmark3.1 Flash-Lite2.5 Flash-LiteClaude Haiku 4.5GPT-5 mini
GPQA Diamond86.9%66.7%73.0%82.3%
MMMU-Pro76.8%51.0%58.0%74.1%
LiveCodeBench72.0%34.3%53.2%80.4%
MMMLU (multilingual)88.9%84.5%83.0%84.9%
CharXiv Reasoning73.2%55.5%61.7%75.5%
Video-MMMU84.8%60.7%82.5%
AA Intelligence Index341337

The GPQA result is the one that stands out. GPQA Diamond tests graduate-level science reasoning - physics, chemistry, biology questions written by PhD students. Flash-Lite at 86.9% beats Claude 4.5 Haiku at 73.0% by a margin that matters, and costs a quarter as much on input. The coding benchmark (LiveCodeBench) is where it falls short - GPT-5 mini at 80.4% is noticeably better.

The 1M context window

Claude 4.5 Haiku tops out at 200K tokens. GPT-4o mini at 128K. Gemini 3.1 Flash-Lite does 1M - roughly 750,000 words, or about 8 novels. For most workloads that's more than you'll ever use. For some, it's the deciding factor.

The catch: long-context recall degrades. MRCR benchmark scores show 60.1% at 128K dropping to 12.3% at 1M tokens. That last number is low enough that you should not rely on it for needle-in-a-haystack retrieval at very long contexts. Context caching at $0.025/1M tokens is a better tool for that.

Where 1M context does help: passing large codebases as context, batching many documents into a single request to reduce overhead, or use cases where the model only needs to partially process a large in-context reference.

Real cost scenarios

Three common workloads compared against Claude 4.5 Haiku, which is the natural alternative given the benchmark overlap:

Customer support: 2K input + 500 output, 5,000 requests/day
Gemini 3.1 Flash-Lite: $187/moClaude 4.5 Haiku: $675/mo3.6x cheaper
Content moderation: 1K input + 200 output, 50,000 requests/day
Gemini 3.1 Flash-Lite: $870/moClaude 4.5 Haiku: $3,150/mo3.6x cheaper
Document summarization: 10K input + 2K output, 500 requests/day
Gemini 3.1 Flash-Lite: $82/moClaude 4.5 Haiku: $300/mo3.7x cheaper

Basic estimates with no caching. Context caching and batch API can cut these further. Run your numbers with our cost calculator.

Watch the output cost

$1.50/1M output sounds fine until you compare it to Mistral Small 4 at $0.60/1M or GPT-4o mini at $0.60/1M. For output-heavy workloads - writing, code generation, long summaries - that 2.5x gap adds up.

Example: 1,000 daily requests generating 2,000 output tokens each. Monthly output tokens: 60M. Gemini 3.1 Flash-Lite: $90/month. Mistral Small 4: $36/month. The benchmark advantage needs to be worth the extra $54/month for that workload. For input-heavy pipelines where outputs are short, the math flips and Flash-Lite wins on cost.

When it makes sense, and when it doesn't

Good fit
  • High-volume classification with short outputs
  • Multimodal pipelines: text, images, video, audio, PDFs in one call
  • Workloads needing 200K-1M context at sub-$0.30 input cost
  • Science and reasoning tasks where GPQA performance matters
  • Translation and multilingual processing at scale
  • Batch jobs where the 50% batch discount applies
Not a good fit
  • Output-heavy workloads (Mistral Small 4 is 2.5x cheaper on output)
  • Precision coding tasks (LiveCodeBench: 72% vs 80% for GPT-5 mini)
  • Production workloads that can't tolerate preview-model changes
  • Real-time audio/video streaming (Live API not supported)
  • Reliable needle-in-a-haystack retrieval at very long context

The preview status is worth taking seriously. Google increased Flash-Lite pricing significantly between generations ($0.10 to $0.25 on input). Build in the assumption that GA pricing may differ from what's showing now.

Bottom line

Gemini 3.1 Flash-Lite is a real step up. 62% better on the AA Intelligence Index is not a rounding error, and the GPQA gap over Claude 4.5 Haiku (86.9% vs 73%) is large enough to matter for reasoning-heavy workloads. At $0.25 input vs Haiku's $1.00, you get better benchmark performance for a quarter of the cost.

The limits are real too: output costs more than the cheapest alternatives, coding performance lags GPT-5 mini, and the model is still preview. For input-heavy classification, extraction, and multimodal pipelines where outputs are concise, it's probably the best option in the sub-$0.30 input tier right now.

Compare it against every model on our pricing page, or run your specific workload numbers with the cost calculator.

Sources