How much more expensive is Gemini 2.5 Flash compared to 2.0 Flash?

Gemini 2.5 Flash costs $0.30 per million input tokens and $2.50 per million output tokens. Gemini 2.0 Flash was $0.10 input and $0.40 output - so 2.5 Flash is 3x more expensive on input and 6.25x more expensive on output.

GuideMarch 30, 2026·8 min read

Gemini 2.0 Flash is deprecated: what migration actually costs you

Google deprecated Gemini 2.0 Flash on February 18, 2026. It shuts down June 1. The obvious replacement - Gemini 2.5 Flash - costs 3x more on input and 6x more on output. There is a cheaper path, and most teams are probably picking the wrong one.

Google Gemini Flash model family - migration guide from 2.0 to 2.5

Image source: Google DeepMind

Both gemini-2.0-flash and gemini-2.0-flash-lite shut down June 1, 2026. Gemini 2.5 Flash-Lite is priced identically to 2.0 Flash at $0.10/$0.40 per million tokens - and has 8x the output token limit. Gemini 2.5 Flash is the bigger upgrade at $0.30/$2.50, but thinking tokens can push that output rate much higher unless you explicitly set thinkingBudget: 0. If you're on 2.0 Flash-Lite, the move to 2.5 Flash-Lite is a 33% price bump and a clear improvement. Most people are migrating to the wrong model.

The full 2.0 family is going away

Google deprecated four models on February 18, 2026, all with a June 1 shutdown:

Model	Shutdown	Replace with
`gemini-2.0-flash`	June 1, 2026	`gemini-2.5-flash`
`gemini-2.0-flash-001`	June 1, 2026	`gemini-2.5-flash`
`gemini-2.0-flash-lite`	June 1, 2026	`gemini-2.5-flash-lite`
`gemini-2.0-flash-lite-001`	June 1, 2026	`gemini-2.5-flash-lite`

This is the last of the 2.0 generation. The experimental and preview variants were already gone - Flash Thinking shut down December 2025, Flash Live shortly after. June 1 is the final sweep.

Google says June 1 is "the earliest possible" date and will send advance notice before the actual cutoff. That said, planning for June 1 is the safe assumption.

What you are moving to

Here are the current Flash-tier options side by side. The faded rows are the deprecated models.

Model	Input / 1M	Output / 1M	Max output	Status
`gemini-2.0-flash`	$0.10	$0.40	8K	Deprecated
`gemini-2.0-flash-lite`	$0.075	$0.30	8K	Deprecated
`gemini-2.5-flash-lite`	$0.10	$0.40	65K	GA
`gemini-3.1-flash-lite-preview`	$0.25	$1.50	65K	Preview
`gemini-2.5-flash`	$0.30	$2.50	65K	GA
`gemini-2.5-pro`	$1.25	$10.00	65K	GA

Text input pricing. Audio rates differ. Gemini 2.5 Pro input price is for prompts under 200K tokens. Source: ai.google.dev/gemini-api/docs/pricing

What migration costs per month

Take a typical production workload: 50M input tokens and 50M output tokens per month. Document Q&A, daily summaries, something in that range.

Path	Input cost	Output cost	Monthly total	vs 2.0 Flash
gemini-2.0-flash (current, deprecated)	$5.00	$20.00	$25.00	baseline
gemini-2.5-flash-lite	$5.00	$20.00	$25.00	same
gemini-2.5-flash, thinking off	$15.00	$125.00	$140.00	+460%
gemini-2.5-flash, thinking on (est. 1K thinking/resp)	$15.00	$250.00+	$265.00+	+960%+

The last row is not a hypothetical worst case. If you swap the model string and leave everything else unchanged, thinking tokens will fire by default and your output costs will be much higher than the $2.50/1M headline. Google's own docs call this out explicitly.

The 2.5 Flash-Lite row deserves more attention. It matches 2.0 Flash pricing exactly and it is GA-stable. For most workloads that are doing extraction, classification, or summarization, it will be the correct landing point.

The thinking token billing problem

Gemini 2.5 Flash has a thinkingBudget setting. When thinking runs, those tokens bill at the full output rate - $2.50 per million. A 500-token response with 1,000 thinking tokens bills as 1,500 output tokens, not 500.

Google called this out in the 2.5 Flash launch post: "If you want to keep the lowest cost and latency while still improving performance over 2.0 Flash, set the thinking budget to 0." We tested this ourselves and it works as described - thinking-off 2.5 Flash is noticeably faster than thinking-on, and the output is identical for straightforward classification and extraction tasks.

Disable thinking in Gemini 2.5 Flash

const response = await genAI
  .getGenerativeModel({ model: "gemini-2.5-flash" })
  .generateContent({
    contents: [{ role: "user", parts: [{ text: prompt }] }],
    generationConfig: {
      thinkingConfig: { thinkingBudget: 0 },
    },
  });

Which path to take

The right migration depends on what 2.0 Flash was actually doing.

High-volume, cost-sensitive workloads (classification, extraction, summarization)

Go to Gemini 2.5 Flash-Lite. It costs $0.10/$0.40 - same as 2.0 Flash - and the output limit jumped from 8K to 65K tokens. For structured extraction or classification, the quality difference is unlikely to matter. Run your prompts against both models first, but 2.5 Flash-Lite will probably be fine.

Complex generation or reasoning tasks

Go to Gemini 2.5 Flash with thinkingBudget: 0. You get meaningfully better reasoning than 2.0 Flash, plus 8x the output token limit. At $0.30/$2.50 it is significantly more expensive, but there is no cheaper option in the Gemini family at this capability level.

Gemini 2.0 Flash-Lite users

Migrate to Gemini 2.5 Flash-Lite. The price bump is 33% ($0.075 to $0.10, $0.30 to $0.40). You gain a knowledge cutoff update from August 2024 to January 2025, context caching support, and an 8x jump in output token limit. The easiest migration in the table.

What else changes

Feature	2.0 Flash	2.5 Flash-Lite	2.5 Flash
Max output tokens	8,192	65,536	65,536
Input context window	1M	1M	1M
Knowledge cutoff	Aug 2024	Jan 2025	Jan 2025
Thinking mode	Experimental	No	Yes (configurable)
Context caching	Yes	Yes	Yes
File search	No	No	Yes
URL context	No	No	Yes
Grounding with Search	Yes	Yes	Yes

The 8K output cap on 2.0 Flash was a real constraint for anything producing long code or documents. Both 2.5 models raise that to 65K. If your app was hitting truncation errors, the migration fixes that problem at the same cost or better.

Context caching can offset part of the cost increase

If your workload sends the same large system prompt or document repeatedly, context caching changes the math. The cache read rate is much lower than fresh input.

Model	Cache write / 1M	Cache read / 1M	vs fresh input
`gemini-2.0-flash (deprecated)`	$0.025	$0.025	25% of $0.10
`gemini-2.5-flash-lite`	$0.025	$0.01	10% of $0.10
`gemini-2.5-flash`	$0.075	$0.03	10% of $0.30

On 2.5 Flash, cache reads cost $0.03/1M - 10% of the fresh input rate. If you have a 100K token system prompt queried 1,000 times a day, caching reduces the effective input cost on those tokens from $30 to $3 per day. That can close most of the gap for heavy-context workloads.

What to do before June 1

1.Search your codebase for gemini-2.0-flash and gemini-2.0-flash-lite. Include the pinned variants: -001 suffixes are also going away.
2.If you're on 2.0 Flash-Lite: swap to 2.5 Flash-Lite. It's essentially the same price and a clear quality improvement. You can do this safely with minimal testing.
3.If you're on 2.0 Flash: run your actual prompts against 2.5 Flash-Lite before assuming you need 2.5 Flash. For classification and extraction, the cheaper model will likely match quality.
4.If you do move to 2.5 Flash: explicitly set thinkingBudget: 0 unless you want the reasoning capability. Unconfigured thinking will spike your output costs beyond the $2.50/1M headline.

Compare Gemini model prices Calculate your migration cost

FAQ

When is Gemini 2.0 Flash being shut down?

June 1, 2026 is the earliest possible shutdown. Google deprecated the model on February 18, 2026 and says they will give advance notice before the exact date. You have at least two months from late March.

How much more expensive is Gemini 2.5 Flash?

Gemini 2.5 Flash costs $0.30/1M input and $2.50/1M output. Gemini 2.0 Flash was $0.10 and $0.40 - a 3x increase on input and 6.25x on output. With thinking tokens unconfigured, effective output costs can be 2-3x higher still.

What is the cheapest replacement for Gemini 2.0 Flash?

Gemini 2.5 Flash-Lite at $0.10 input and $0.40 output per million tokens - the same price as Gemini 2.0 Flash. It is GA-stable and has a much better output token limit (65K vs 8K).

Is Gemini 2.0 Flash-Lite also deprecated?

Yes. Gemini 2.0 Flash-Lite and gemini-2.0-flash-lite-001 are deprecated with the same June 1, 2026 shutdown date. The recommended replacement is Gemini 2.5 Flash-Lite - a 33% price increase with a much better output limit and updated knowledge cutoff.

Do thinking tokens make Gemini 2.5 Flash more expensive than advertised?

Yes. Thinking tokens bill at the same $2.50/1M output rate as regular output tokens. A 500-token response with 1,000 thinking tokens costs as much as a 1,500-token response. Set thinkingBudget: 0 in generationConfig to disable this and use the headline price.