Skip to main content
TokenCost logoTokenCost
GuideMarch 30, 2026·8 min read

Gemini 2.0 Flash is deprecated: what migration actually costs you

Google deprecated Gemini 2.0 Flash on February 18, 2026. It shuts down June 1. The obvious replacement - Gemini 2.5 Flash - costs 3x more on input and 6x more on output. There is a cheaper path, and most teams are probably picking the wrong one.

Google Gemini Flash model family - migration guide from 2.0 to 2.5

Image source: Google DeepMind

Both gemini-2.0-flash and gemini-2.0-flash-lite shut down June 1, 2026. Gemini 2.5 Flash-Lite is priced identically to 2.0 Flash at $0.10/$0.40 per million tokens - and has 8x the output token limit. Gemini 2.5 Flash is the bigger upgrade at $0.30/$2.50, but thinking tokens can push that output rate much higher unless you explicitly set thinkingBudget: 0. If you're on 2.0 Flash-Lite, the move to 2.5 Flash-Lite is a 33% price bump and a clear improvement. Most people are migrating to the wrong model.

The full 2.0 family is going away

Google deprecated four models on February 18, 2026, all with a June 1 shutdown:

ModelShutdownReplace with
gemini-2.0-flashJune 1, 2026gemini-2.5-flash
gemini-2.0-flash-001June 1, 2026gemini-2.5-flash
gemini-2.0-flash-liteJune 1, 2026gemini-2.5-flash-lite
gemini-2.0-flash-lite-001June 1, 2026gemini-2.5-flash-lite

This is the last of the 2.0 generation. The experimental and preview variants were already gone - Flash Thinking shut down December 2025, Flash Live shortly after. June 1 is the final sweep.

Google says June 1 is "the earliest possible" date and will send advance notice before the actual cutoff. That said, planning for June 1 is the safe assumption.

What you are moving to

Here are the current Flash-tier options side by side. The faded rows are the deprecated models.

ModelInput / 1MOutput / 1MMax outputStatus
gemini-2.0-flash$0.10$0.408KDeprecated
gemini-2.0-flash-lite$0.075$0.308KDeprecated
gemini-2.5-flash-lite$0.10$0.4065KGA
gemini-3.1-flash-lite-preview$0.25$1.5065KPreview
gemini-2.5-flash$0.30$2.5065KGA
gemini-2.5-pro$1.25$10.0065KGA

Text input pricing. Audio rates differ. Gemini 2.5 Pro input price is for prompts under 200K tokens. Source: ai.google.dev/gemini-api/docs/pricing

What migration costs per month

Take a typical production workload: 50M input tokens and 50M output tokens per month. Document Q&A, daily summaries, something in that range.

PathInput costOutput costMonthly totalvs 2.0 Flash
gemini-2.0-flash (current, deprecated)$5.00$20.00$25.00baseline
gemini-2.5-flash-lite$5.00$20.00$25.00same
gemini-2.5-flash, thinking off$15.00$125.00$140.00+460%
gemini-2.5-flash, thinking on (est. 1K thinking/resp)$15.00$250.00+$265.00++960%+

The last row is not a hypothetical worst case. If you swap the model string and leave everything else unchanged, thinking tokens will fire by default and your output costs will be much higher than the $2.50/1M headline. Google's own docs call this out explicitly.

The 2.5 Flash-Lite row deserves more attention. It matches 2.0 Flash pricing exactly and it is GA-stable. For most workloads that are doing extraction, classification, or summarization, it will be the correct landing point.

The thinking token billing problem

Gemini 2.5 Flash has a thinkingBudget setting. When thinking runs, those tokens bill at the full output rate - $2.50 per million. A 500-token response with 1,000 thinking tokens bills as 1,500 output tokens, not 500.

Google called this out in the 2.5 Flash launch post: "If you want to keep the lowest cost and latency while still improving performance over 2.0 Flash, set the thinking budget to 0." We tested this ourselves and it works as described - thinking-off 2.5 Flash is noticeably faster than thinking-on, and the output is identical for straightforward classification and extraction tasks.

Disable thinking in Gemini 2.5 Flash

const response = await genAI
  .getGenerativeModel({ model: "gemini-2.5-flash" })
  .generateContent({
    contents: [{ role: "user", parts: [{ text: prompt }] }],
    generationConfig: {
      thinkingConfig: { thinkingBudget: 0 },
    },
  });

Which path to take

The right migration depends on what 2.0 Flash was actually doing.

High-volume, cost-sensitive workloads (classification, extraction, summarization)

Go to Gemini 2.5 Flash-Lite. It costs $0.10/$0.40 - same as 2.0 Flash - and the output limit jumped from 8K to 65K tokens. For structured extraction or classification, the quality difference is unlikely to matter. Run your prompts against both models first, but 2.5 Flash-Lite will probably be fine.

Complex generation or reasoning tasks

Go to Gemini 2.5 Flash with thinkingBudget: 0. You get meaningfully better reasoning than 2.0 Flash, plus 8x the output token limit. At $0.30/$2.50 it is significantly more expensive, but there is no cheaper option in the Gemini family at this capability level.

Gemini 2.0 Flash-Lite users

Migrate to Gemini 2.5 Flash-Lite. The price bump is 33% ($0.075 to $0.10, $0.30 to $0.40). You gain a knowledge cutoff update from August 2024 to January 2025, context caching support, and an 8x jump in output token limit. The easiest migration in the table.

What else changes

Feature2.0 Flash2.5 Flash-Lite2.5 Flash
Max output tokens8,19265,53665,536
Input context window1M1M1M
Knowledge cutoffAug 2024Jan 2025Jan 2025
Thinking modeExperimentalNoYes (configurable)
Context cachingYesYesYes
File searchNoNoYes
URL contextNoNoYes
Grounding with SearchYesYesYes

The 8K output cap on 2.0 Flash was a real constraint for anything producing long code or documents. Both 2.5 models raise that to 65K. If your app was hitting truncation errors, the migration fixes that problem at the same cost or better.

Context caching can offset part of the cost increase

If your workload sends the same large system prompt or document repeatedly, context caching changes the math. The cache read rate is much lower than fresh input.

ModelCache write / 1MCache read / 1Mvs fresh input
gemini-2.0-flash (deprecated)$0.025$0.02525% of $0.10
gemini-2.5-flash-lite$0.025$0.0110% of $0.10
gemini-2.5-flash$0.075$0.0310% of $0.30

On 2.5 Flash, cache reads cost $0.03/1M - 10% of the fresh input rate. If you have a 100K token system prompt queried 1,000 times a day, caching reduces the effective input cost on those tokens from $30 to $3 per day. That can close most of the gap for heavy-context workloads.

What to do before June 1

  • 1.Search your codebase for gemini-2.0-flash and gemini-2.0-flash-lite. Include the pinned variants: -001 suffixes are also going away.
  • 2.If you're on 2.0 Flash-Lite: swap to 2.5 Flash-Lite. It's essentially the same price and a clear quality improvement. You can do this safely with minimal testing.
  • 3.If you're on 2.0 Flash: run your actual prompts against 2.5 Flash-Lite before assuming you need 2.5 Flash. For classification and extraction, the cheaper model will likely match quality.
  • 4.If you do move to 2.5 Flash: explicitly set thinkingBudget: 0 unless you want the reasoning capability. Unconfigured thinking will spike your output costs beyond the $2.50/1M headline.
Compare Gemini model pricesCalculate your migration cost

FAQ

When is Gemini 2.0 Flash being shut down?

June 1, 2026 is the earliest possible shutdown. Google deprecated the model on February 18, 2026 and says they will give advance notice before the exact date. You have at least two months from late March.

How much more expensive is Gemini 2.5 Flash?

Gemini 2.5 Flash costs $0.30/1M input and $2.50/1M output. Gemini 2.0 Flash was $0.10 and $0.40 - a 3x increase on input and 6.25x on output. With thinking tokens unconfigured, effective output costs can be 2-3x higher still.

What is the cheapest replacement for Gemini 2.0 Flash?

Gemini 2.5 Flash-Lite at $0.10 input and $0.40 output per million tokens - the same price as Gemini 2.0 Flash. It is GA-stable and has a much better output token limit (65K vs 8K).

Is Gemini 2.0 Flash-Lite also deprecated?

Yes. Gemini 2.0 Flash-Lite and gemini-2.0-flash-lite-001 are deprecated with the same June 1, 2026 shutdown date. The recommended replacement is Gemini 2.5 Flash-Lite - a 33% price increase with a much better output limit and updated knowledge cutoff.

Do thinking tokens make Gemini 2.5 Flash more expensive than advertised?

Yes. Thinking tokens bill at the same $2.50/1M output rate as regular output tokens. A 500-token response with 1,000 thinking tokens costs as much as a 1,500-token response. Set thinkingBudget: 0 in generationConfig to disable this and use the headline price.

Sources