Gemini 3.1 Pro: $2 input, tied for #1 on benchmarks, and 20% cheaper than GPT-5.4
Released February 19, Gemini 3.1 Pro Preview has climbed to the top of every major benchmark leaderboard. It costs $2.00 per million input tokens - less than GPT-5.4, less than Claude Opus 4.6, and currently tied for the highest intelligence score measured. Here's the full pricing breakdown and what the benchmarks actually say.

Photo by Sandip Kalal on Unsplash
TL;DR
Gemini 3.1 Pro Preview costs $2.00 per million input tokens (under 200K) - 20% less than GPT-5.4 and 60% less than Claude Opus 4.6, with the same top-of-leaderboard intelligence score as GPT-5.4 on Artificial Analysis. Batch API cuts rates in half; context caching runs $0.20/1M. Old model ID gemini-3-pro-preview was shut down March 9 - update to gemini-3.1-pro-preview. No free tier, knowledge cutoff January 2025, and preview status means pricing can change before GA.
Pricing breakdown
There are two price points depending on prompt length. Under 200K tokens, you pay $2.00 input and $12.00 output per million. Once a single request crosses 200K tokens, the entire request - both input and output - gets billed at the higher rate: $4.00 and $18.00. That means a 201K-token prompt costs twice as much to input as a 199K-token one.
For most workloads, the under-200K tier is the one that matters. At $2.00 input, Gemini 3.1 Pro is 20% cheaper than GPT-5.4 ($2.50) and 60% cheaper than Claude Opus 4.6 ($5.00). That gap is large enough to matter at any meaningful scale.
| Model | Input / 1M | Output / 1M | Context | AA Score |
|---|---|---|---|---|
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | 34 |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | 47 |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 | 1M | 57 |
| GPT-5.4 | $2.50 | $15.00 | 1.1M | 57 |
| Claude Opus 4.6 | $5.00 | $25.00 | 1M | 53 |
AA Score = Artificial Analysis Intelligence Index v4.0 out of 126 models. Prices from Google AI Studio and official provider docs. Gemini 3.1 Pro shown at the under-200K input rate.
The three inference tiers
Google added tiered inference when they launched the Gemini 3.x lineup. You trade latency guarantees for price. The three options for Gemini 3.1 Pro:
| Tier | Input / 1M | Output / 1M | Notes |
|---|---|---|---|
| Batch / Flex | $1.00 | $6.00 | 50% off standard. Async, up to 24h turnaround. |
| Standard | $2.00 | $12.00 | Default. Shared capacity, no SLA. |
| Priority | $3.60 | $21.60 | 1.8x premium. Reserved capacity, faster queuing. |
Above the 200K threshold, those rates roughly double: standard goes to $4.00/$18.00, batch to $2.00/$9.00, priority to $7.20/$32.40.
Context caching sits at $0.20/1M tokens (under 200K) or $0.40/1M (over 200K), plus $4.50 per million tokens per hour in storage. If you send the same large document in every request, the breakeven on caching is usually a few reuses. Cached tokens cost 90% less than fresh input - the math works out quickly on anything document-heavy.
Benchmark scores
Google published a full evaluation report in February. The headline number is an Artificial Analysis Intelligence Index score of 57 - tied with GPT-5.4 at the top of a 126-model leaderboard. That index pulls from 10 components including GPQA Diamond, APEX-Agents, and Humanity's Last Exam.
| Benchmark | Gemini 3.1 Pro | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
| GPQA Diamond | 94.3% | 91.3% | 92.4% |
| ARC-AGI-2 | 77.1% | 68.8% | 52.9% |
| SWE-Bench Verified | 80.6% | 80.8% | 80.0% |
| Humanity's Last Exam (no tools) | 44.4% | 40.0% | 34.5% |
| BrowseComp | 85.9% | 84.0% | 65.8% |
| Terminal-Bench 2.0 | 68.5% | 65.4% | 54.0% |
| MMMLU (multilingual) | 92.6% | 91.1% | 89.6% |
Source: Google DeepMind Gemini 3.1 Pro evaluation report (February 2026). Thinking High setting where applicable. Gemini 3.1 Pro leads shown in teal.
The one area where Claude Opus 4.6 still holds an edge is SWE-Bench Verified: 80.8% vs 80.6%. That's within noise for most practical purposes. On anything involving long-context reasoning, scientific knowledge, or agent web browsing, Gemini 3.1 Pro currently leads.
Speed
Artificial Analysis measures two things separately: time to first token and output speed after that. Gemini 3.1 Pro outputs at 127.4 tokens per second - well above the 73.6 t/s median for comparable frontier models. GPT-5.4 outputs at 74.2 t/s, so for longer completions you get about 1.7x the throughput.
Time to first token is 30.66 seconds - you wait about half a minute before the first token arrives. GPT-5.4's TTFT is 152 seconds in extended thinking mode. Neither is suited for interactive chat where users expect instant feedback. Both are fine for batch or agentic pipelines where latency is measured in minutes anyway.
Migrating from Gemini 3 Pro
If you were using gemini-3-pro-preview, those API calls are now failing. Google shut the model down on March 9, 2026 - less than three weeks after 3.1 Pro launched. The new model ID is gemini-3.1-pro-preview. For agentic pipelines that rely heavily on bash or custom tool definitions, there's also gemini-3.1-pro-preview-customtools.
Gemini 3 expanded thought signatures - the traceability layer for following reasoning steps - to all response part types, not just function calls as in Gemini 2.5. On ARC-AGI-2, the jump from Gemini 3 Pro (around 31%) to 3.1 Pro (77.1%) is the largest single-generation benchmark improvement Google has published for an agentic task.
What it doesn't do
- -Knowledge cutoff is January 2025. That's 15 months behind this post. Claude Sonnet 4.6 cuts off August 2025. For time-sensitive questions, use Search grounding (first 5,000 prompts/month free across Gemini 3 models, then $14 per thousand search queries - note Google charges per query, and one prompt can trigger several) or supply context explicitly.
- -No free tier. Every token is billed. Unlike Gemini 3 Flash Preview, there's no free quota on 3.1 Pro.
- -Preview status. Rate limits are tighter than GA models. Pricing can change before GA. Don't build hard cost budgets around current rates without a buffer.
- -65,536 max output tokens. Claude Opus 4.6 supports 128K. For tasks that produce very long outputs - full codebases, long-form documents - that ceiling matters.
- -Output is text only. Takes text, image, video, audio, and PDF as input. No Live API, no image generation, no audio output. File search is AI Studio only - not on Vertex AI.
Cost scenarios
Concrete numbers using the standard under-200K rate with a 3:1 input-to-output ratio:
| Volume | Gemini 3.1 Pro | GPT-5.4 | Claude Opus 4.6 |
|---|---|---|---|
| 100K reqs (1K in / 300 out) | $0.56 | $0.70 | $1.40 |
| 1M reqs (1K in / 300 out) | $5.60 | $7.00 | $14.00 |
| 10M tokens / day (3:1 ratio) | $4.50 | $5.63 | $11.25 |
| 100M tokens / day (3:1 ratio) | $45 | $56 | $113 |
Blended rates: Gemini 3.1 Pro $4.50/1M, GPT-5.4 $5.63/1M, Claude Opus 4.6 $11.25/1M (75/25 input-output split). Use the calculator for your actual ratio.
Which model to use
For top-tier intelligence where cost matters at scale, Gemini 3.1 Pro is the current answer. Same intelligence score as GPT-5.4, 20% cheaper. Claude Opus 4.6 costs 2.5x more on a blended basis and only holds a slim edge on SWE-Bench (0.2%) and long-output tasks.
If you're running agentic pipelines deep in OpenAI's toolchain, GPT-5.4 still has less integration friction. Teams already on Google Cloud or Vertex AI get tighter Search grounding and avoid cross-provider latency with Gemini 3.1 Pro.
One caveat: preview pricing can change. Cost stability matters more than marginal savings for some workloads - Gemini 2.5 Pro at $1.25/$10.00 is the GA alternative. The intelligence gap is smaller than the version numbers suggest, and the pricing is locked.
Where this leaves things
Gemini 3.1 Pro is the first time a model has matched GPT-5.4 on the Artificial Analysis Intelligence Index while coming in cheaper on every billing tier. For teams spending real money on frontier inference, that gap - $2.00 vs $2.50 on input, $12.00 vs $15.00 on output - compounds quickly. At 100M tokens per day it is about $450/month. At 1B tokens per day it is $4,500/month.
Preview status is the main risk. Google shut down the previous version three weeks after 3.1 Pro launched, and preview pricing has shifted before at GA. Worth testing seriously now, but route budget-critical workloads through the calculator before committing to the current rate card.
Sources
Ankit Aglawe
April 7, 2026 · 8 min read