Gemini Flex and Priority inference: how Google's new tiers work and what they cost
Google added Flex and Priority inference to the Gemini API on April 2. Flex is 50% off standard pricing, synchronous, and takes 1-15 minutes to respond. Priority costs 75-100% more and guarantees fast responses -- but only if you're Tier 2, and there's a silent downgrade you need to know about.

Until now, the Gemini API had two pricing modes: standard (full price, fast) and batch (50% off, up to 24 hours, async). Google's April 2 announcement added Flex and Priority to fill in the gaps. Flex brings batch-level pricing to synchronous requests -- same endpoint, same call pattern, just slower. Priority lets you pay more to stay out of the throttle queue during load spikes. Neither is obviously right for everyone. Which one matters to you depends on whether your bottleneck is cost or latency variance.
The four tiers side by side
| Tier | Pricing | Latency | Interface | Access |
|---|---|---|---|---|
| Priority | +75-100% over standard | Seconds | Synchronous | Tier 2+ only |
| Standard | Full price | Seconds to minutes | Synchronous | Tier 1+ |
| Flex | 50% off standard | 1-15 minutes | Synchronous | Tier 1+ |
| Batch | 50% off standard | Up to 24 hours | Asynchronous | Tier 1+ |
Batch pricing is the same as Flex (50% off), but uses an asynchronous interface with job IDs and up to 24-hour completion windows. Flex and Batch are priced identically.
Flex: batch pricing without the async headache
Flex costs the same as Batch -- 50% off standard -- but it's synchronous. You call the same generateContent endpoint and add one parameter: service_tier: "flex". You wait for the response in the same call. No JSONL files, no polling for job status, no managing batch objects.
The catch is latency. Flex requests can sit in a queue for up to 15 minutes before Google processes them. For agentic workflows where each step depends on the previous output, 15 minutes per step adds up fast. But for background tasks that don't need to complete immediately -- data extraction pipelines, report generation, embedding jobs -- Flex is often a better fit than Batch because you don't have to build the async infrastructure around it.
One thing the docs are clear about: if Flex capacity is full, you get a 503 or 429. Requests don't silently upgrade to Standard and charge you the higher rate. That's actually the right behavior -- you just need to implement retry logic with exponential backoff on your end.
Priority: paying for reliability, with one catch
Priority costs 80% more than standard -- not 75-100%, that's the range Google lists, but the actual multiplier is 1.8x when you do the math. Gemini 2.5 Flash at standard is $0.30 input / $2.50 output. At Priority: $0.54 / $4.50. In exchange, your requests skip the queue and get processed in seconds, even during high load.
The thing most developers will miss: Priority silently downgrades to Standard when you exceed the Priority rate limits. You get a Standard-speed response billed at Standard rates -- and there's no error, no flag in the response body. The only way to confirm you're actually getting Priority is checking the x-gemini-service-tier response header. If it says standard when you expected priority, you've been downgraded.
Priority rate limits are also lower than Standard -- about 0.3x the normal RPM per model. This isn't a bug in the docs; it's by design. Priority is for burst use cases where you occasionally need guaranteed latency, not for sustained high-throughput production traffic. If you need both, you're in Tier 3 territory.
Access requirement: Tier 2 accounts only. That means at least $100 in real paid charges (not free trial credits, not promotional credits) and a 3-day wait from your first successful payment. There's no workaround for new accounts.
Full pricing by model and tier
All prices per 1 million tokens. Batch pricing is identical to Flex. Models marked "Preview" have more restrictive rate limits regardless of tier.
| Model | Standard | Flex (50% off) | Priority (+80%) |
|---|---|---|---|
| Gemini 3.1 Pro PreviewPrices above 200K context window input tokens: 2x | $2.00 in $12.00 out | $1.00 in $6.00 out | $3.60 in $21.60 out |
| Gemini 2.5 ProPrices above 200K context window input tokens: 2x | $1.25 in $10.00 out | $0.63 in $5.00 out | $2.25 in $18.00 out |
| Gemini 3 Flash Preview | $0.50 in $3.00 out | $0.25 in $1.50 out | $0.90 in $5.40 out |
| Gemini 2.5 Flash | $0.30 in $2.50 out | $0.15 in $1.25 out | $0.54 in $4.50 out |
| Gemini 3.1 Flash-Lite Preview | $0.25 in $1.50 out | $0.125 in $0.75 out | $0.45 in $2.70 out |
| Gemini 2.5 Flash-Lite | $0.10 in $0.40 out | $0.05 in $0.20 out | $0.18 in $0.72 out |
Context caching is available on all tiers. Cached input costs more on Priority (example: Gemini 3.1 Pro cache is $0.36/M on Priority vs $0.20/M on Standard). Audio input is priced separately and also scales with the tier multiplier.
Which tier actually makes sense for your use case
Flex is the right call if you have agentic workflows where steps are sequential but don't need to complete in real time -- think overnight report generation, classification pipelines you kick off and check later, or batch data processing where the results feed into the next day's job. Flex beats Batch specifically when your pipeline is sequential rather than purely parallel. Batch requires you to manage JSONL files and poll for completion; Flex keeps everything in one synchronous call that you just... wait for.
Priority makes sense for user-facing features where a multi-second delay causes a real problem -- chat interfaces, interactive document analysis, anything where the user is watching a spinner. It also helps in production environments with spiky traffic where standard can get throttled during peaks. Whether the 80% premium is worth it depends on how much you care about latency variance vs. average latency. If you just need fast on average, standard is usually fine. If you need fast even at the 95th percentile, Priority is what you're paying for.
Standard remains the default for most use cases. If you're not optimizing for either budget or guaranteed latency, there's no reason to change tiers.
Five things to know before switching tiers
1. Flex client timeout
The Gemini SDK's default timeout is far shorter than 15 minutes. If you don't override it, your client disconnects from the Flex queue and gets no response. Set your client timeout to 900,000ms (15 minutes) at minimum.
2. Priority silent downgrade
When you exceed Priority rate limits, requests silently fall through to Standard -- billed at Standard rates, no error thrown. Check the x-gemini-service-tier response header if you need to confirm you're actually getting Priority.
3. Priority rate limits are lower, not higher
Priority accounts get roughly 0.3x the standard RPM per model. It's designed for burst use cases, not sustained high throughput. If you need both guaranteed latency and high throughput, you'll need Tier 3 and a conversation with Google Cloud sales.
4. Priority needs Tier 2 ($100+ real spend, 3-day wait)
New accounts and Tier 1 accounts cannot use Priority inference at all. Tier 2 requires $100 in real billed charges (not free credits or promotional credits) plus at least 3 days from your first successful payment.
5. Context caching costs more on Priority
Cache pricing is also tiered. For Gemini 3.1 Pro: cached input costs $0.36/M on Priority vs $0.20/M on Standard. If you rely heavily on caching, the premium is wider than the headline 80% looks.