How much does Gemini 3.1 Pro cost?

Gemini 3.1 Pro Preview costs $2.00 per million input tokens and $12.00 per million output tokens for prompts under 200K tokens. Prompts over 200K are billed at $4.00 input and $18.00 output. The batch API cuts both rates in half. There is no free tier.

Is Gemini 3.1 Pro better than GPT-5.4?

On the Artificial Analysis Intelligence Index, Gemini 3.1 Pro and GPT-5.4 are tied at score 57 - both rank #1 out of 126 models. Gemini 3.1 Pro is 20% cheaper on both input and output, and outputs tokens 1.7x faster (127 vs 74 tokens/second). GPT-5.4 takes about 5x longer to generate its first token in extended thinking mode.

What is Gemini 3.1 Pro context window?

Gemini 3.1 Pro Preview has a 1,048,576 token (1M) context window. Maximum output is 65,536 tokens. Prompts over 200K tokens are billed at higher rates: $4.00 input and $18.00 output per million tokens.

What is the model ID for Gemini 3.1 Pro?

The API model ID is gemini-3.1-pro-preview. There is also gemini-3.1-pro-preview-customtools which prioritizes bash and custom tool use in agentic workflows. Note: the predecessor gemini-3-pro-preview was shut down on March 9, 2026.

How does Gemini 3.1 Pro compare to Claude Opus 4.6?

Gemini 3.1 Pro costs $2.00 input vs Claude Opus 4.6 at $5.00 - that is 60% less. Output is $12.00 vs $25.00, about 52% less. Claude Opus 4.6 holds the top two spots on LM Arena and edges ahead on SWE-Bench Verified (80.8% vs 80.6%). Gemini 3.1 Pro leads on ARC-AGI-2 (77.1% vs 68.8%) and GPQA Diamond (94.3% vs 91.3%).

Model ReleaseApril 7, 2026·8 min read

Gemini 3.1 Pro: $2 input, tied for #1 on benchmarks, and 20% cheaper than GPT-5.4

Released February 19, Gemini 3.1 Pro Preview has climbed to the top of every major benchmark leaderboard. It costs $2.00 per million input tokens - less than GPT-5.4, less than Claude Opus 4.6, and currently tied for the highest intelligence score measured. Here's the full pricing breakdown and what the benchmarks actually say.

Gemini 3.1 Pro model announcement from Google DeepMind

Photo by Sandip Kalal on Unsplash

TL;DR

Gemini 3.1 Pro Preview costs $2.00 per million input tokens (under 200K) - 20% less than GPT-5.4 and 60% less than Claude Opus 4.6, with the same top-of-leaderboard intelligence score as GPT-5.4 on Artificial Analysis. Batch API cuts rates in half; context caching runs $0.20/1M. Old model ID gemini-3-pro-preview was shut down March 9 - update to gemini-3.1-pro-preview. No free tier, knowledge cutoff January 2025, and preview status means pricing can change before GA.

Pricing breakdown

There are two price points depending on prompt length. Under 200K tokens, you pay $2.00 input and $12.00 output per million. Once a single request crosses 200K tokens, the entire request - both input and output - gets billed at the higher rate: $4.00 and $18.00. That means a 201K-token prompt costs twice as much to input as a 199K-token one.

For most workloads, the under-200K tier is the one that matters. At $2.00 input, Gemini 3.1 Pro is 20% cheaper than GPT-5.4 ($2.50) and 60% cheaper than Claude Opus 4.6 ($5.00). That gap is large enough to matter at any meaningful scale.

Model	Input / 1M	Output / 1M	Context	AA Score
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M	34
Gemini 2.5 Pro	$1.25	$10.00	1M	47
Gemini 3.1 Pro Preview	$2.00	$12.00	1M	57
GPT-5.4	$2.50	$15.00	1.1M	57
Claude Opus 4.6	$5.00	$25.00	1M	53

AA Score = Artificial Analysis Intelligence Index v4.0 out of 126 models. Prices from Google AI Studio and official provider docs. Gemini 3.1 Pro shown at the under-200K input rate.

The three inference tiers

Google added tiered inference when they launched the Gemini 3.x lineup. You trade latency guarantees for price. The three options for Gemini 3.1 Pro:

Tier	Input / 1M	Output / 1M	Notes
Batch / Flex	$1.00	$6.00	50% off standard. Async, up to 24h turnaround.
Standard	$2.00	$12.00	Default. Shared capacity, no SLA.
Priority	$3.60	$21.60	1.8x premium. Reserved capacity, faster queuing.

Above the 200K threshold, those rates roughly double: standard goes to $4.00/$18.00, batch to $2.00/$9.00, priority to $7.20/$32.40.

Context caching sits at $0.20/1M tokens (under 200K) or $0.40/1M (over 200K), plus $4.50 per million tokens per hour in storage. If you send the same large document in every request, the breakeven on caching is usually a few reuses. Cached tokens cost 90% less than fresh input - the math works out quickly on anything document-heavy.

Benchmark scores

Google published a full evaluation report in February. The headline number is an Artificial Analysis Intelligence Index score of 57 - tied with GPT-5.4 at the top of a 126-model leaderboard. That index pulls from 10 components including GPQA Diamond, APEX-Agents, and Humanity's Last Exam.

Benchmark	Gemini 3.1 Pro	Claude Opus 4.6	GPT-5.4
GPQA Diamond	94.3%	91.3%	92.4%
ARC-AGI-2	77.1%	68.8%	52.9%
SWE-Bench Verified	80.6%	80.8%	80.0%
Humanity's Last Exam (no tools)	44.4%	40.0%	34.5%
BrowseComp	85.9%	84.0%	65.8%
Terminal-Bench 2.0	68.5%	65.4%	54.0%
MMMLU (multilingual)	92.6%	91.1%	89.6%

Source: Google DeepMind Gemini 3.1 Pro evaluation report (February 2026). Thinking High setting where applicable. Gemini 3.1 Pro leads shown in teal.

The one area where Claude Opus 4.6 still holds an edge is SWE-Bench Verified: 80.8% vs 80.6%. That's within noise for most practical purposes. On anything involving long-context reasoning, scientific knowledge, or agent web browsing, Gemini 3.1 Pro currently leads.

Speed

Artificial Analysis measures two things separately: time to first token and output speed after that. Gemini 3.1 Pro outputs at 127.4 tokens per second - well above the 73.6 t/s median for comparable frontier models. GPT-5.4 outputs at 74.2 t/s, so for longer completions you get about 1.7x the throughput.

Time to first token is 30.66 seconds - you wait about half a minute before the first token arrives. GPT-5.4's TTFT is 152 seconds in extended thinking mode. Neither is suited for interactive chat where users expect instant feedback. Both are fine for batch or agentic pipelines where latency is measured in minutes anyway.

Migrating from Gemini 3 Pro

If you were using gemini-3-pro-preview, those API calls are now failing. Google shut the model down on March 9, 2026 - less than three weeks after 3.1 Pro launched. The new model ID is gemini-3.1-pro-preview. For agentic pipelines that rely heavily on bash or custom tool definitions, there's also gemini-3.1-pro-preview-customtools.

Gemini 3 expanded thought signatures - the traceability layer for following reasoning steps - to all response part types, not just function calls as in Gemini 2.5. On ARC-AGI-2, the jump from Gemini 3 Pro (around 31%) to 3.1 Pro (77.1%) is the largest single-generation benchmark improvement Google has published for an agentic task.

What it doesn't do

-Knowledge cutoff is January 2025. That's 15 months behind this post. Claude Sonnet 4.6 cuts off August 2025. For time-sensitive questions, use Search grounding (first 5,000 prompts/month free across Gemini 3 models, then $14 per thousand search queries - note Google charges per query, and one prompt can trigger several) or supply context explicitly.
-No free tier. Every token is billed. Unlike Gemini 3 Flash Preview, there's no free quota on 3.1 Pro.
-Preview status. Rate limits are tighter than GA models. Pricing can change before GA. Don't build hard cost budgets around current rates without a buffer.
-65,536 max output tokens. Claude Opus 4.6 supports 128K. For tasks that produce very long outputs - full codebases, long-form documents - that ceiling matters.
-Output is text only. Takes text, image, video, audio, and PDF as input. No Live API, no image generation, no audio output. File search is AI Studio only - not on Vertex AI.

Cost scenarios

Concrete numbers using the standard under-200K rate with a 3:1 input-to-output ratio:

Volume	Gemini 3.1 Pro	GPT-5.4	Claude Opus 4.6
100K reqs (1K in / 300 out)	$0.56	$0.70	$1.40
1M reqs (1K in / 300 out)	$5.60	$7.00	$14.00
10M tokens / day (3:1 ratio)	$4.50	$5.63	$11.25
100M tokens / day (3:1 ratio)	$45	$56	$113

Blended rates: Gemini 3.1 Pro $4.50/1M, GPT-5.4 $5.63/1M, Claude Opus 4.6 $11.25/1M (75/25 input-output split). Use the calculator for your actual ratio.

Which model to use

For top-tier intelligence where cost matters at scale, Gemini 3.1 Pro is the current answer. Same intelligence score as GPT-5.4, 20% cheaper. Claude Opus 4.6 costs 2.5x more on a blended basis and only holds a slim edge on SWE-Bench (0.2%) and long-output tasks.

If you're running agentic pipelines deep in OpenAI's toolchain, GPT-5.4 still has less integration friction. Teams already on Google Cloud or Vertex AI get tighter Search grounding and avoid cross-provider latency with Gemini 3.1 Pro.

One caveat: preview pricing can change. Cost stability matters more than marginal savings for some workloads - Gemini 2.5 Pro at $1.25/$10.00 is the GA alternative. The intelligence gap is smaller than the version numbers suggest, and the pricing is locked.

Where this leaves things

Gemini 3.1 Pro is the first time a model has matched GPT-5.4 on the Artificial Analysis Intelligence Index while coming in cheaper on every billing tier. For teams spending real money on frontier inference, that gap - $2.00 vs $2.50 on input, $12.00 vs $15.00 on output - compounds quickly. At 100M tokens per day it is about $450/month. At 1B tokens per day it is $4,500/month.

Preview status is the main risk. Google shut down the previous version three weeks after 3.1 Pro launched, and preview pricing has shifted before at GA. Worth testing seriously now, but route budget-critical workloads through the calculator before committing to the current rate card.

Sources

Compare all model prices Calculate your API cost

Ankit Aglawe

April 7, 2026 · 8 min read