Skip to main content
TokenCost logoTokenCost
Model ReleaseApril 19, 2026·7 min read

Gemini 3 Flash: $0.50 per million tokens, thinking on by default, and it actually beats Pro on agentic tasks

Google launched Gemini 3 Flash preview in December 2025. The headline price is $0.50 input / $3.00 output. What the headline does not say: thinking is on by default, thinking tokens bill as output, and on agentic benchmarks this model outperforms Gemini 3.1 Pro at a quarter of the cost.

Gemini 3 Flash model card showing pricing and benchmark comparison data

Image source: Google DeepMind

Gemini 3 Flash (gemini-3-flash-preview) costs $0.50/1M input, $3.00/1M output. The part most pricing roundups miss: thinking defaults to “high” and those thinking tokens bill at the same $3.00/1M rate as your actual output. On benchmarks it trails Gemini 3.1 Pro on pure reasoning but beats it on Toolathlon (49.4% vs 36.4%) and MCP Atlas (57.4% vs 54.1%) -- the agentic tool-use evals that actually matter for agent workloads.

Still preview as of April 2026. Launched December 17, 2025. Batch API: $0.25/$1.50. Context caching: $0.05/1M read.

The thinking default changes your bill

With Gemini 2.5 Flash, thinking was optional and off by default. With Gemini 3 Flash, it defaults to thinking_level=high unless you say otherwise. Every thinking token bills at $3.00/1M - the same rate as regular output tokens. Google does not separate them on the invoice.

In practice: a request with 5K input tokens that generates 3K thinking tokens and 2K actual output will bill 5K output tokens, not 2K. On a pipeline running 1,000 such requests per day, thinking alone adds roughly $270/month at current prices.

You can set thinking_level to low, medium, or minimal to reduce this. One warning from the docs: “minimal” does not guarantee thinking is fully off - it just budgets very little for it. Temperature should also stay at 1.0; lowering it can cause the model to loop on math and reasoning tasks.

The full Gemini 3 pricing ladder

Gemini 3 Flash sits between the $0.25 Flash-Lite and the $2.00 Pro. Gemini 2.5 Flash is still available as a stable (non-preview) option at $0.30 input if production reliability matters more than higher reasoning.

ModelInput / 1MOutput / 1MContextThinking default
Gemini 3.1 Proabove 200K: $4/$18$2.00$12.001Mhigh
Gemini 3 Flash$0.50$3.001Mhigh
Gemini 3.1 Flash-Lite$0.25$1.501Mhigh
Gemini 2.5 Flash (stable)$0.30$2.501Moff

Prices from Google AI pricing docs, retrieved April 19, 2026. All Gemini 3.x models are currently in preview. Gemini 2.5 Flash is the only stable (GA) option in this family.

Where Flash actually beats Pro

On reasoning benchmarks, Gemini 3 Flash trails Pro by a small margin - 90.4% vs 91.9% on GPQA Diamond, 95.2% vs 95.0% on AIME 2025 (Flash actually edges ahead there). That is roughly what you would expect from a model that costs 75% less on input.

The agentic tool-use numbers are where it gets interesting. On Toolathlon - a benchmark for multi-step tool orchestration - Flash scores 49.4% versus Pro's 36.4%. On MCP Atlas, 57.4% versus Pro's 54.1%. Gemini 2.5 Flash scores 3.7% and 3.4% on those same tests. That is not a small gap.

We checked those Toolathlon numbers twice. A Flash-tier model outperforming the Pro tier on multi-step tool orchestration by 13 points is not what you would predict from the pricing hierarchy - yet it shows up consistently across both agentic benchmarks Google published.

Benchmark3 Flash3.1 Pro2.5 FlashClaude Sonnet 4.5
GPQA Diamond90.4%91.9%82.8%83.4%
AIME 202595.2%95.0%72.0%87.0%
SWE-bench Verified78.0%76.2%60.4%77.2%
Toolathlon49.4%36.4%3.7%38.9%
MCP Atlas57.4%54.1%3.4%43.8%
LiveCodeBench Elo2316243911431418
MMMU-Pro81.2%81.0%66.7%68.0%
Humanity's Last Exam33.7%37.5%11.0%

All results from Google DeepMind's official benchmark table. Gemini 3 Flash results use thinking_level=high. The LiveCodeBench gap (2316 vs 2439) is real - if pure coding output is the main use case, Pro still has an edge.

What it actually costs to run

Two scenarios that show both where Gemini 3 Flash makes sense and where it does not.

Agentic coding: 8K input + 4K thinking + 2K output = 6K billed output, 500 req/day
Gemini 3 Flash: $345/moGemini 3.1 Pro: $1,380/moClaude Sonnet 4.5: $1,350/mo

Pro runs to $1,380/month for the same workload -- Flash gets comparable or better agentic results for $345

High-volume classification: 1K input + 200 thinking + 100 output = 300 billed output, 10K req/day
Gemini 3 Flash: $495/moGemini 3.1 Flash-Lite: $165/moGemini 2.5 Flash: $300/mo

Flash-Lite trims this to $165/month -- for pure classification the benchmark gap rarely matters

Both scenarios use thinking_level=high (the default). Using thinking_level=minimal on the classification scenario cuts output costs roughly in half. Run your specific workload through the cost calculator.

Context caching and Search grounding

Context caching reads at $0.05/1M - 10x cheaper than standard input. For workloads with large shared system prompts or reference documents, that difference compounds quickly. Storage is $1.00 per 1M tokens per hour, so short-lived caches are fine but anything stored for days needs a cost-benefit check.

Google Search grounding works differently from Gemini 2.5: the free tier is 5,000 prompts per month shared across all Gemini 3 models in your project, not per-model. After that, $14 per 1,000 search queries. For an agent that searches on every call, at 1,000 requests/day that is $420/month in grounding fees alone, on top of token costs.

When it makes sense

Good fit
  • Agentic workflows: tool orchestration, multi-step agents, MCP pipelines
  • Coding pipelines where Gemini 3.1 Pro feels overpriced for the gains
  • Complex document analysis mixing text, images, video, and PDFs
  • Workloads that genuinely benefit from deep reasoning per request
  • Batch processing at $0.25/$1.50 batch input/output pricing
Not a good fit
  • High-volume classification or extraction (Flash-Lite is 3x cheaper)
  • Production workloads that cannot tolerate preview-model changes
  • Short-output pipelines where thinking overhead costs more than it adds
  • Search-heavy agents where $14/1,000 grounding queries compounds fast

The preview status has been ongoing since December 2025 with no GA date announced. Four months in preview is not unusual for Gemini models, but it means rate limits are tighter and pricing could change before it becomes stable.

Where it fits

Gemini 3 Flash is genuinely good at agentic work. The Toolathlon and MCP Atlas scores beating Gemini 3.1 Pro are not marketing claims - they are from Google's own benchmark page, and the margins are wide enough to take seriously. For a model at $0.50 input versus Pro's $2.00, that is a real value proposition.

The thing to watch is the thinking default. If you migrate from Gemini 2.5 Flash expecting the same pricing structure, your output costs will be higher than expected until you tune thinking_level for each use case. For classification, set it to minimal. For reasoning and agentic tasks, keep it at medium or high and budget for the extra tokens.

See where it lands against every other model on our pricing page, or run your specific workload through the cost calculator.

Sources