How much does Gemini 3 Flash cost?

Gemini 3 Flash (gemini-3-flash-preview) costs $0.50 per million input tokens for text, images, and video, $1.00 per million for audio input, and $3.00 per million output tokens. Thinking tokens count as output and are billed at the same $3.00/1M rate. The batch API halves input and output to $0.25 and $1.50 respectively.

Does Gemini 3 Flash think automatically?

Yes. Gemini 3 Flash uses dynamic thinking with thinking_level defaulting to high. Thinking tokens are billed as output tokens at $3.00/1M, so they add to your bill. You can reduce costs by setting thinking_level to low, medium, or minimal, though minimal does not guarantee thinking is fully off.

How does Gemini 3 Flash compare to Gemini 3.1 Pro?

Gemini 3 Flash costs $0.50/1M input vs $2.00/1M for Gemini 3.1 Pro - 75% cheaper. On reasoning benchmarks they are close: GPQA Diamond 90.4% vs 91.9%, AIME 2025 95.2% vs 95.0%. On agentic tool-use benchmarks, Flash actually beats Pro: Toolathlon 49.4% vs 36.4%, MCP Atlas 57.4% vs 54.1%.

Is Gemini 3 Flash available as a stable API?

No. As of April 2026, Gemini 3 Flash is still in preview (model ID: gemini-3-flash-preview). It launched December 17, 2025 and has remained in preview. Preview models have more restrictive rate limits and pricing may change before becoming stable.

Model ReleaseApril 19, 2026·7 min read

Gemini 3 Flash: $0.50 per million tokens, thinking on by default, and it actually beats Pro on agentic tasks

Google launched Gemini 3 Flash preview in December 2025. The headline price is $0.50 input / $3.00 output. What the headline does not say: thinking is on by default, thinking tokens bill as output, and on agentic benchmarks this model outperforms Gemini 3.1 Pro at a quarter of the cost.

Gemini 3 Flash model card showing pricing and benchmark comparison data

Image source: Google DeepMind

Gemini 3 Flash (gemini-3-flash-preview) costs $0.50/1M input, $3.00/1M output. The part most pricing roundups miss: thinking defaults to “high” and those thinking tokens bill at the same $3.00/1M rate as your actual output. On benchmarks it trails Gemini 3.1 Pro on pure reasoning but beats it on Toolathlon (49.4% vs 36.4%) and MCP Atlas (57.4% vs 54.1%) -- the agentic tool-use evals that actually matter for agent workloads.

Still preview as of April 2026. Launched December 17, 2025. Batch API: $0.25/$1.50. Context caching: $0.05/1M read.

The thinking default changes your bill

With Gemini 2.5 Flash, thinking was optional and off by default. With Gemini 3 Flash, it defaults to thinking_level=high unless you say otherwise. Every thinking token bills at $3.00/1M - the same rate as regular output tokens. Google does not separate them on the invoice.

In practice: a request with 5K input tokens that generates 3K thinking tokens and 2K actual output will bill 5K output tokens, not 2K. On a pipeline running 1,000 such requests per day, thinking alone adds roughly $270/month at current prices.

You can set thinking_level to low, medium, or minimal to reduce this. One warning from the docs: “minimal” does not guarantee thinking is fully off - it just budgets very little for it. Temperature should also stay at 1.0; lowering it can cause the model to loop on math and reasoning tasks.

The full Gemini 3 pricing ladder

Gemini 3 Flash sits between the $0.25 Flash-Lite and the $2.00 Pro. Gemini 2.5 Flash is still available as a stable (non-preview) option at $0.30 input if production reliability matters more than higher reasoning.

Model	Input / 1M	Output / 1M	Context	Thinking default
Gemini 3.1 Proabove 200K: $4/$18	$2.00	$12.00	1M	high
Gemini 3 Flash	$0.50	$3.00	1M	high
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M	high
Gemini 2.5 Flash (stable)	$0.30	$2.50	1M	off

Prices from Google AI pricing docs, retrieved April 19, 2026. All Gemini 3.x models are currently in preview. Gemini 2.5 Flash is the only stable (GA) option in this family.

Where Flash actually beats Pro

On reasoning benchmarks, Gemini 3 Flash trails Pro by a small margin - 90.4% vs 91.9% on GPQA Diamond, 95.2% vs 95.0% on AIME 2025 (Flash actually edges ahead there). That is roughly what you would expect from a model that costs 75% less on input.

The agentic tool-use numbers are where it gets interesting. On Toolathlon - a benchmark for multi-step tool orchestration - Flash scores 49.4% versus Pro's 36.4%. On MCP Atlas, 57.4% versus Pro's 54.1%. Gemini 2.5 Flash scores 3.7% and 3.4% on those same tests. That is not a small gap.

We checked those Toolathlon numbers twice. A Flash-tier model outperforming the Pro tier on multi-step tool orchestration by 13 points is not what you would predict from the pricing hierarchy - yet it shows up consistently across both agentic benchmarks Google published.

Benchmark	3 Flash	3.1 Pro	2.5 Flash	Claude Sonnet 4.5
GPQA Diamond	90.4%	91.9%	82.8%	83.4%
AIME 2025	95.2%	95.0%	72.0%	87.0%
SWE-bench Verified	78.0%	76.2%	60.4%	77.2%
Toolathlon	49.4%	36.4%	3.7%	38.9%
MCP Atlas	57.4%	54.1%	3.4%	43.8%
LiveCodeBench Elo	2316	2439	1143	1418
MMMU-Pro	81.2%	81.0%	66.7%	68.0%
Humanity's Last Exam	33.7%	37.5%	11.0%	-

All results from Google DeepMind's official benchmark table. Gemini 3 Flash results use thinking_level=high. The LiveCodeBench gap (2316 vs 2439) is real - if pure coding output is the main use case, Pro still has an edge.

What it actually costs to run

Two scenarios that show both where Gemini 3 Flash makes sense and where it does not.

Agentic coding: 8K input + 4K thinking + 2K output = 6K billed output, 500 req/day

Gemini 3 Flash: $345/moGemini 3.1 Pro: $1,380/moClaude Sonnet 4.5: $1,350/mo

Pro runs to $1,380/month for the same workload -- Flash gets comparable or better agentic results for $345

High-volume classification: 1K input + 200 thinking + 100 output = 300 billed output, 10K req/day

Gemini 3 Flash: $495/moGemini 3.1 Flash-Lite: $165/moGemini 2.5 Flash: $300/mo

Flash-Lite trims this to $165/month -- for pure classification the benchmark gap rarely matters

Both scenarios use thinking_level=high (the default). Using thinking_level=minimal on the classification scenario cuts output costs roughly in half. Run your specific workload through the cost calculator.

Context caching and Search grounding

Context caching reads at $0.05/1M - 10x cheaper than standard input. For workloads with large shared system prompts or reference documents, that difference compounds quickly. Storage is $1.00 per 1M tokens per hour, so short-lived caches are fine but anything stored for days needs a cost-benefit check.

Google Search grounding works differently from Gemini 2.5: the free tier is 5,000 prompts per month shared across all Gemini 3 models in your project, not per-model. After that, $14 per 1,000 search queries. For an agent that searches on every call, at 1,000 requests/day that is $420/month in grounding fees alone, on top of token costs.

When it makes sense

Good fit

Agentic workflows: tool orchestration, multi-step agents, MCP pipelines
Coding pipelines where Gemini 3.1 Pro feels overpriced for the gains
Complex document analysis mixing text, images, video, and PDFs
Workloads that genuinely benefit from deep reasoning per request
Batch processing at $0.25/$1.50 batch input/output pricing

Not a good fit

High-volume classification or extraction (Flash-Lite is 3x cheaper)
Production workloads that cannot tolerate preview-model changes
Short-output pipelines where thinking overhead costs more than it adds
Search-heavy agents where $14/1,000 grounding queries compounds fast

The preview status has been ongoing since December 2025 with no GA date announced. Four months in preview is not unusual for Gemini models, but it means rate limits are tighter and pricing could change before it becomes stable.

Where it fits

Gemini 3 Flash is genuinely good at agentic work. The Toolathlon and MCP Atlas scores beating Gemini 3.1 Pro are not marketing claims - they are from Google's own benchmark page, and the margins are wide enough to take seriously. For a model at $0.50 input versus Pro's $2.00, that is a real value proposition.

The thing to watch is the thinking default. If you migrate from Gemini 2.5 Flash expecting the same pricing structure, your output costs will be higher than expected until you tune thinking_level for each use case. For classification, set it to minimal. For reasoning and agentic tasks, keep it at medium or high and budget for the extra tokens.

See where it lands against every other model on our pricing page, or run your specific workload through the cost calculator.

Sources

Google DeepMind: Gemini 3 Flash model page with official benchmarks
Google AI: Gemini API pricing (updated April 15, 2026)
Google AI Developer Docs: Gemini 3 model guide
Google AI changelog: Gemini 3 Flash launch (December 17, 2025)
Anthropic: Claude model pricing (April 2026)

Compare All Model Pricing Calculate Your API Costs