Gemini 2.0 Flash is deprecated: what migration actually costs you
Google deprecated Gemini 2.0 Flash on February 18, 2026. It shuts down June 1. The obvious replacement - Gemini 2.5 Flash - costs 3x more on input and 6x more on output. There is a cheaper path, and most teams are probably picking the wrong one.

Image source: Google DeepMind
Both gemini-2.0-flash and gemini-2.0-flash-lite shut down June 1, 2026. Gemini 2.5 Flash-Lite is priced identically to 2.0 Flash at $0.10/$0.40 per million tokens - and has 8x the output token limit. Gemini 2.5 Flash is the bigger upgrade at $0.30/$2.50, but thinking tokens can push that output rate much higher unless you explicitly set thinkingBudget: 0. If you're on 2.0 Flash-Lite, the move to 2.5 Flash-Lite is a 33% price bump and a clear improvement. Most people are migrating to the wrong model.
The full 2.0 family is going away
Google deprecated four models on February 18, 2026, all with a June 1 shutdown:
| Model | Shutdown | Replace with |
|---|---|---|
gemini-2.0-flash | June 1, 2026 | gemini-2.5-flash |
gemini-2.0-flash-001 | June 1, 2026 | gemini-2.5-flash |
gemini-2.0-flash-lite | June 1, 2026 | gemini-2.5-flash-lite |
gemini-2.0-flash-lite-001 | June 1, 2026 | gemini-2.5-flash-lite |
This is the last of the 2.0 generation. The experimental and preview variants were already gone - Flash Thinking shut down December 2025, Flash Live shortly after. June 1 is the final sweep.
Google says June 1 is "the earliest possible" date and will send advance notice before the actual cutoff. That said, planning for June 1 is the safe assumption.
What you are moving to
Here are the current Flash-tier options side by side. The faded rows are the deprecated models.
| Model | Input / 1M | Output / 1M | Max output | Status |
|---|---|---|---|---|
gemini-2.0-flash | $0.10 | $0.40 | 8K | Deprecated |
gemini-2.0-flash-lite | $0.075 | $0.30 | 8K | Deprecated |
gemini-2.5-flash-lite | $0.10 | $0.40 | 65K | GA |
gemini-3.1-flash-lite-preview | $0.25 | $1.50 | 65K | Preview |
gemini-2.5-flash | $0.30 | $2.50 | 65K | GA |
gemini-2.5-pro | $1.25 | $10.00 | 65K | GA |
Text input pricing. Audio rates differ. Gemini 2.5 Pro input price is for prompts under 200K tokens. Source: ai.google.dev/gemini-api/docs/pricing
What migration costs per month
Take a typical production workload: 50M input tokens and 50M output tokens per month. Document Q&A, daily summaries, something in that range.
| Path | Input cost | Output cost | Monthly total | vs 2.0 Flash |
|---|---|---|---|---|
| gemini-2.0-flash (current, deprecated) | $5.00 | $20.00 | $25.00 | baseline |
| gemini-2.5-flash-lite | $5.00 | $20.00 | $25.00 | same |
| gemini-2.5-flash, thinking off | $15.00 | $125.00 | $140.00 | +460% |
| gemini-2.5-flash, thinking on (est. 1K thinking/resp) | $15.00 | $250.00+ | $265.00+ | +960%+ |
The last row is not a hypothetical worst case. If you swap the model string and leave everything else unchanged, thinking tokens will fire by default and your output costs will be much higher than the $2.50/1M headline. Google's own docs call this out explicitly.
The 2.5 Flash-Lite row deserves more attention. It matches 2.0 Flash pricing exactly and it is GA-stable. For most workloads that are doing extraction, classification, or summarization, it will be the correct landing point.
The thinking token billing problem
Gemini 2.5 Flash has a thinkingBudget setting. When thinking runs, those tokens bill at the full output rate - $2.50 per million. A 500-token response with 1,000 thinking tokens bills as 1,500 output tokens, not 500.
Google called this out in the 2.5 Flash launch post: "If you want to keep the lowest cost and latency while still improving performance over 2.0 Flash, set the thinking budget to 0." We tested this ourselves and it works as described - thinking-off 2.5 Flash is noticeably faster than thinking-on, and the output is identical for straightforward classification and extraction tasks.
Disable thinking in Gemini 2.5 Flash
const response = await genAI
.getGenerativeModel({ model: "gemini-2.5-flash" })
.generateContent({
contents: [{ role: "user", parts: [{ text: prompt }] }],
generationConfig: {
thinkingConfig: { thinkingBudget: 0 },
},
});Which path to take
The right migration depends on what 2.0 Flash was actually doing.
High-volume, cost-sensitive workloads (classification, extraction, summarization)
Go to Gemini 2.5 Flash-Lite. It costs $0.10/$0.40 - same as 2.0 Flash - and the output limit jumped from 8K to 65K tokens. For structured extraction or classification, the quality difference is unlikely to matter. Run your prompts against both models first, but 2.5 Flash-Lite will probably be fine.
Complex generation or reasoning tasks
Go to Gemini 2.5 Flash with thinkingBudget: 0. You get meaningfully better reasoning than 2.0 Flash, plus 8x the output token limit. At $0.30/$2.50 it is significantly more expensive, but there is no cheaper option in the Gemini family at this capability level.
Gemini 2.0 Flash-Lite users
Migrate to Gemini 2.5 Flash-Lite. The price bump is 33% ($0.075 to $0.10, $0.30 to $0.40). You gain a knowledge cutoff update from August 2024 to January 2025, context caching support, and an 8x jump in output token limit. The easiest migration in the table.
What else changes
| Feature | 2.0 Flash | 2.5 Flash-Lite | 2.5 Flash |
|---|---|---|---|
| Max output tokens | 8,192 | 65,536 | 65,536 |
| Input context window | 1M | 1M | 1M |
| Knowledge cutoff | Aug 2024 | Jan 2025 | Jan 2025 |
| Thinking mode | Experimental | No | Yes (configurable) |
| Context caching | Yes | Yes | Yes |
| File search | No | No | Yes |
| URL context | No | No | Yes |
| Grounding with Search | Yes | Yes | Yes |
The 8K output cap on 2.0 Flash was a real constraint for anything producing long code or documents. Both 2.5 models raise that to 65K. If your app was hitting truncation errors, the migration fixes that problem at the same cost or better.
Context caching can offset part of the cost increase
If your workload sends the same large system prompt or document repeatedly, context caching changes the math. The cache read rate is much lower than fresh input.
| Model | Cache write / 1M | Cache read / 1M | vs fresh input |
|---|---|---|---|
gemini-2.0-flash (deprecated) | $0.025 | $0.025 | 25% of $0.10 |
gemini-2.5-flash-lite | $0.025 | $0.01 | 10% of $0.10 |
gemini-2.5-flash | $0.075 | $0.03 | 10% of $0.30 |
On 2.5 Flash, cache reads cost $0.03/1M - 10% of the fresh input rate. If you have a 100K token system prompt queried 1,000 times a day, caching reduces the effective input cost on those tokens from $30 to $3 per day. That can close most of the gap for heavy-context workloads.
What to do before June 1
- 1.Search your codebase for
gemini-2.0-flashandgemini-2.0-flash-lite. Include the pinned variants:-001suffixes are also going away. - 2.If you're on 2.0 Flash-Lite: swap to 2.5 Flash-Lite. It's essentially the same price and a clear quality improvement. You can do this safely with minimal testing.
- 3.If you're on 2.0 Flash: run your actual prompts against 2.5 Flash-Lite before assuming you need 2.5 Flash. For classification and extraction, the cheaper model will likely match quality.
- 4.If you do move to 2.5 Flash: explicitly set
thinkingBudget: 0unless you want the reasoning capability. Unconfigured thinking will spike your output costs beyond the $2.50/1M headline.
FAQ
When is Gemini 2.0 Flash being shut down?
June 1, 2026 is the earliest possible shutdown. Google deprecated the model on February 18, 2026 and says they will give advance notice before the exact date. You have at least two months from late March.
How much more expensive is Gemini 2.5 Flash?
Gemini 2.5 Flash costs $0.30/1M input and $2.50/1M output. Gemini 2.0 Flash was $0.10 and $0.40 - a 3x increase on input and 6.25x on output. With thinking tokens unconfigured, effective output costs can be 2-3x higher still.
What is the cheapest replacement for Gemini 2.0 Flash?
Gemini 2.5 Flash-Lite at $0.10 input and $0.40 output per million tokens - the same price as Gemini 2.0 Flash. It is GA-stable and has a much better output token limit (65K vs 8K).
Is Gemini 2.0 Flash-Lite also deprecated?
Yes. Gemini 2.0 Flash-Lite and gemini-2.0-flash-lite-001 are deprecated with the same June 1, 2026 shutdown date. The recommended replacement is Gemini 2.5 Flash-Lite - a 33% price increase with a much better output limit and updated knowledge cutoff.
Do thinking tokens make Gemini 2.5 Flash more expensive than advertised?
Yes. Thinking tokens bill at the same $2.50/1M output rate as regular output tokens. A 500-token response with 1,000 thinking tokens costs as much as a 1,500-token response. Set thinkingBudget: 0 in generationConfig to disable this and use the headline price.