Gemini 3 Flash: $0.50 per million tokens, thinking on by default, and it actually beats Pro on agentic tasks
Google launched Gemini 3 Flash preview in December 2025. The headline price is $0.50 input / $3.00 output. What the headline does not say: thinking is on by default, thinking tokens bill as output, and on agentic benchmarks this model outperforms Gemini 3.1 Pro at a quarter of the cost.

Image source: Google DeepMind
Gemini 3 Flash (gemini-3-flash-preview) costs $0.50/1M input, $3.00/1M output. The part most pricing roundups miss: thinking defaults to “high” and those thinking tokens bill at the same $3.00/1M rate as your actual output. On benchmarks it trails Gemini 3.1 Pro on pure reasoning but beats it on Toolathlon (49.4% vs 36.4%) and MCP Atlas (57.4% vs 54.1%) -- the agentic tool-use evals that actually matter for agent workloads.
Still preview as of April 2026. Launched December 17, 2025. Batch API: $0.25/$1.50. Context caching: $0.05/1M read.
The thinking default changes your bill
With Gemini 2.5 Flash, thinking was optional and off by default. With Gemini 3 Flash, it defaults to thinking_level=high unless you say otherwise. Every thinking token bills at $3.00/1M - the same rate as regular output tokens. Google does not separate them on the invoice.
In practice: a request with 5K input tokens that generates 3K thinking tokens and 2K actual output will bill 5K output tokens, not 2K. On a pipeline running 1,000 such requests per day, thinking alone adds roughly $270/month at current prices.
You can set thinking_level to low, medium, or minimal to reduce this. One warning from the docs: “minimal” does not guarantee thinking is fully off - it just budgets very little for it. Temperature should also stay at 1.0; lowering it can cause the model to loop on math and reasoning tasks.
The full Gemini 3 pricing ladder
Gemini 3 Flash sits between the $0.25 Flash-Lite and the $2.00 Pro. Gemini 2.5 Flash is still available as a stable (non-preview) option at $0.30 input if production reliability matters more than higher reasoning.
| Model | Input / 1M | Output / 1M | Context | Thinking default |
|---|---|---|---|---|
| Gemini 3.1 Proabove 200K: $4/$18 | $2.00 | $12.00 | 1M | high |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | high |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | high |
| Gemini 2.5 Flash (stable) | $0.30 | $2.50 | 1M | off |
Prices from Google AI pricing docs, retrieved April 19, 2026. All Gemini 3.x models are currently in preview. Gemini 2.5 Flash is the only stable (GA) option in this family.
Where Flash actually beats Pro
On reasoning benchmarks, Gemini 3 Flash trails Pro by a small margin - 90.4% vs 91.9% on GPQA Diamond, 95.2% vs 95.0% on AIME 2025 (Flash actually edges ahead there). That is roughly what you would expect from a model that costs 75% less on input.
The agentic tool-use numbers are where it gets interesting. On Toolathlon - a benchmark for multi-step tool orchestration - Flash scores 49.4% versus Pro's 36.4%. On MCP Atlas, 57.4% versus Pro's 54.1%. Gemini 2.5 Flash scores 3.7% and 3.4% on those same tests. That is not a small gap.
We checked those Toolathlon numbers twice. A Flash-tier model outperforming the Pro tier on multi-step tool orchestration by 13 points is not what you would predict from the pricing hierarchy - yet it shows up consistently across both agentic benchmarks Google published.
| Benchmark | 3 Flash | 3.1 Pro | 2.5 Flash | Claude Sonnet 4.5 |
|---|---|---|---|---|
| GPQA Diamond | 90.4% | 91.9% | 82.8% | 83.4% |
| AIME 2025 | 95.2% | 95.0% | 72.0% | 87.0% |
| SWE-bench Verified | 78.0% | 76.2% | 60.4% | 77.2% |
| Toolathlon | 49.4% | 36.4% | 3.7% | 38.9% |
| MCP Atlas | 57.4% | 54.1% | 3.4% | 43.8% |
| LiveCodeBench Elo | 2316 | 2439 | 1143 | 1418 |
| MMMU-Pro | 81.2% | 81.0% | 66.7% | 68.0% |
| Humanity's Last Exam | 33.7% | 37.5% | 11.0% | — |
All results from Google DeepMind's official benchmark table. Gemini 3 Flash results use thinking_level=high. The LiveCodeBench gap (2316 vs 2439) is real - if pure coding output is the main use case, Pro still has an edge.
What it actually costs to run
Two scenarios that show both where Gemini 3 Flash makes sense and where it does not.
Pro runs to $1,380/month for the same workload -- Flash gets comparable or better agentic results for $345
Flash-Lite trims this to $165/month -- for pure classification the benchmark gap rarely matters
Both scenarios use thinking_level=high (the default). Using thinking_level=minimal on the classification scenario cuts output costs roughly in half. Run your specific workload through the cost calculator.
Context caching and Search grounding
Context caching reads at $0.05/1M - 10x cheaper than standard input. For workloads with large shared system prompts or reference documents, that difference compounds quickly. Storage is $1.00 per 1M tokens per hour, so short-lived caches are fine but anything stored for days needs a cost-benefit check.
Google Search grounding works differently from Gemini 2.5: the free tier is 5,000 prompts per month shared across all Gemini 3 models in your project, not per-model. After that, $14 per 1,000 search queries. For an agent that searches on every call, at 1,000 requests/day that is $420/month in grounding fees alone, on top of token costs.
When it makes sense
- Agentic workflows: tool orchestration, multi-step agents, MCP pipelines
- Coding pipelines where Gemini 3.1 Pro feels overpriced for the gains
- Complex document analysis mixing text, images, video, and PDFs
- Workloads that genuinely benefit from deep reasoning per request
- Batch processing at $0.25/$1.50 batch input/output pricing
- High-volume classification or extraction (Flash-Lite is 3x cheaper)
- Production workloads that cannot tolerate preview-model changes
- Short-output pipelines where thinking overhead costs more than it adds
- Search-heavy agents where $14/1,000 grounding queries compounds fast
The preview status has been ongoing since December 2025 with no GA date announced. Four months in preview is not unusual for Gemini models, but it means rate limits are tighter and pricing could change before it becomes stable.
Where it fits
Gemini 3 Flash is genuinely good at agentic work. The Toolathlon and MCP Atlas scores beating Gemini 3.1 Pro are not marketing claims - they are from Google's own benchmark page, and the margins are wide enough to take seriously. For a model at $0.50 input versus Pro's $2.00, that is a real value proposition.
The thing to watch is the thinking default. If you migrate from Gemini 2.5 Flash expecting the same pricing structure, your output costs will be higher than expected until you tune thinking_level for each use case. For classification, set it to minimal. For reasoning and agentic tasks, keep it at medium or high and budget for the extra tokens.
See where it lands against every other model on our pricing page, or run your specific workload through the cost calculator.