Which is cheaper, GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro?

On list price for a small request, Gemini 3.1 Pro is the cheapest at $2 input / $12 output per 1M tokens, then Opus 4.7 at $5/$25, then GPT-5.5 at $5/$30. The ranking holds for short prompts but inverts on long ones. Gemini moves to a $4/$18 tier above 200K input tokens and GPT-5.5 charges 2x input / 1.5x output above 272K. Opus stays flat but has a tokenizer that bills 25 to 37 percent more tokens for the same input than Opus 4.6 did.

What is the GPT-5.5 long-context surcharge?

Above 272K input tokens in a single session, GPT-5.5 bills $10 per 1M input tokens and $45 per 1M output tokens, instead of the standard $5 and $30. The surcharge applies to the whole session once you cross the threshold, not just to the tokens above it. An 800K input call costs $8.90 with the surcharge active versus $4.60 without it. Long-context users should treat 272K as a hard pricing boundary, not a soft one.

Does Gemini 3.1 Pro have tier-based pricing?

Yes. Up to 200K input tokens, Gemini 3.1 Pro bills $2 input / $12 output per 1M tokens. Above 200K, the rates step to $4 input / $18 output for the entire request. The tier flip is per-session like GPT-5.5's, so a 201K-token request bills the higher rate on every token, not just the tokens above the cap.

Why does Opus 4.7 cost more than its sticker says?

Anthropic kept Opus 4.7 at $5 input / $25 output per 1M tokens, identical to Opus 4.6. The tokenizer changed underneath. The same code, JSON, or chat input produces 25 to 37 percent more tokens than it did on Opus 4.6. Real bills from Finout, OpenRouter, and CloudZero have come in 25 to 37 percent higher on identical workloads. The list rate is honest; the per-task cost is not what the list rate suggests.

Which flagship is best for long-context work in May 2026?

Gemini 3.1 Pro is the cheapest at every context length, even after its 200K tier flip. On an 800K input / 20K output request it bills $3.56 versus $4.50 on Opus 4.7 (sticker, before tokenizer inflation) and $8.90 on GPT-5.5. On long-context retrieval quality, Gemini also leads with 84.9% on MRCR v2 at 128K. GPT-5.5 still wins on 1M-token MRCR but pays for it. Pick Gemini unless the benchmark gap on a specific task type matters more than the cost.

ComparisonMay 10, 2026·10 min read

GPT-5.5 vs Opus 4.7 vs Gemini 3.1 Pro: the cheapest one depends entirely on whether you cross 200K tokens

On a 50K-token coding request, Gemini 3.1 Pro is the cheapest at $0.22, Opus 4.7 next at $0.50, and GPT-5.5 last at $0.55. The numbers feel orderly. They do not stay orderly. As soon as your workload pushes past 200K input tokens, Gemini steps to a higher tier. Past 272K, GPT-5.5 doubles input pricing and lifts output by half. Opus stays nominally flat but bills 25 to 37 percent more tokens than Opus 4.6 did for identical inputs. Every one of these three flagships has a hidden cost cliff. Knowing where the cliffs sit is the difference between a $3.56 bill and an $8.90 bill on the same 800K-token call.

Abstract dark gold flowing lines representing the shifting cost balance between three flagship LLMs

Photo by Mehdi Mirzaie on Unsplash

Three flagships, three different ways the price moves on you. The cliffs nobody puts on the comparison table:

Model	Where the bill changes shape
GPT-5.5	$5/$30 below 272K tokens. $10/$45 above. Whole session, not the tail.
Opus 4.7	$5/$25 flat. New tokenizer bills 25-37% more tokens on the same input.
Gemini 3.1 Pro	$2/$12 below 200K. $4/$18 above. Whole session too.

The list price, before the asterisks

All three landed within five weeks of each other. Opus 4.7 hit GA on April 16, GPT-5.5 on April 23, and Gemini 3.1 Pro had been in public preview since February 19. The sticker rates per million tokens look like this:

Model	Input	Output	Cached input	Tier flip
GPT-5.5	$5.00	$30.00	$0.50	2x in / 1.5x out above 272K
Claude Opus 4.7	$5.00	$25.00	$0.50	None on price; 1.0-1.35x tokenizer
Gemini 3.1 Pro	$2.00	$12.00	$0.50	$4 in / $18 out above 200K

Read just the first three columns and Gemini wins by 2.5x on input and 2.4x on output. That is the headline most coverage stops at. Each of the three models has a fourth column that erases or amplifies the order, depending on what you send through it.

The 272K cliff on GPT-5.5

Of the three asterisks, GPT-5.5's is the steepest. Once a single API call exceeds 272K input tokens, the entire session bills at $10 input / $45 output per million. Not the tokens above the threshold. The whole session. A 271K-token call bills at the standard rate. A 273K-token call bills 2x on every input token and 1.5x on every output token, including the first 272K.

For an 800K input / 20K output call, the math: 0.80M × $10 + 0.02M × $45 = $8.00 + $0.90 = $8.90. Without the surcharge it would be 0.80M × $5 + 0.02M × $30 = $4.60. The surcharge nearly doubles the bill. For long-context retrieval, RAG over a large codebase, or any agent that pulls big documents into context, that single threshold is the most expensive line in the OpenAI pricing page most people have not read.

Codex CLI caps context at 400K, which sits inside the surcharge zone. If you run agentic coding workloads through Codex on GPT-5.5, the surcharge fires by default any time the agent loads more than 272K of repository, logs, or test output. Plan for it or route long-context work elsewhere.

The 200K step on Gemini 3.1 Pro

Gemini's tier flip sits at a lower threshold (200K) but at a smaller multiple (2x input, 1.5x output, just like GPT-5.5). The pricing page lists the rates as $0.00000200 input / $0.00001200 output per token below 200K, jumping to $0.00000400 and $0.00001800 above. Like OpenAI's rule, the higher tier applies to the entire session once you cross the line.

On the 800K / 20K shape from above, Gemini 3.1 Pro sits at $3.56. That is still less than half what GPT-5.5 charges for the same call, even with the tier flip active. The 200K threshold matters more for medium-sized requests. A 200K-token retrieval call bills $1.00 on tier 1. A 201K-token version of the same call bills $1.70. One extra chunk pulled into context flips a 70% premium onto the entire bill.

If you run a retrieval pipeline that hovers around 200K input, treat the threshold as a hard budget. Truncate the lowest-relevance chunks rather than passing them through. Saving 10K of marginal context saves 70% of the bill.

Opus 4.7's tokenizer tax, in one sentence

Opus 4.7 is the only one of the three with a flat per-token rate at every context length. The rate is $5/$25, identical to Opus 4.6. What changed is that the same string of code, JSON, or natural language now produces 1.0 to 1.35x as many tokens as it did on Opus 4.6. Three weeks of production billing data from Finout, OpenRouter, CloudZero, and a few publicly shared invoices put the real-world inflation at 25 to 37 percent on chat, RAG, and coding workloads.

That math is documented in detail in our Opus 4.7 tokenizer post. For the comparison here, the takeaway is: when this article quotes Opus 4.7 at $0.50 on a 50K input call, the production version of that call is closer to $0.625. The post-tokenizer column matters more than the sticker for any planning forecast.

Five workloads, five different winners

Bills, computed on real-world request shapes. Sticker math first; Opus 4.7 also shown with a +25% tokenizer adjustment to reflect what the bill actually looks like.

Workload	GPT-5.5	Opus 4.7 (sticker / +25%)	Gemini 3.1 Pro	Cheapest
Casual coding (50K in / 10K out)	$0.55	$0.50 / $0.625	$0.22	Gemini
Mid refactor at the line (200K in / 50K out)	$2.50	$2.25 / $2.81	$1.00	Gemini
Refactor over the line (250K in / 50K out)	$2.75	$2.50 / $3.13	$1.90	Gemini
Repo-scale agent (500K in / 100K out)	$9.50	$5.00 / $6.25	$3.80	Gemini
Long-context audit (800K in / 20K out)	$8.90	$4.50 / $5.63	$3.56	Gemini

Every row says Gemini, which is the part of this comparison that nobody will be surprised by. The interesting numbers are the ratios. On row one, Gemini comes in at roughly 40% of GPT-5.5's bill. On row five, with the surcharge active, Gemini sits at about 40% again. The gap holds across the entire workload spectrum, even though all three models cross at least one cliff in the middle. Independent benchmarking by Artificial Analysis lands on a similar cost-per-task ratio at the median request size.

Opus 4.7's sticker stays competitive: it beats GPT-5.5 on every row and beats Gemini on none. Apply the tokenizer adjustment and Opus loses to GPT-5.5 on the small-input rows too. The point is not that one model dominates; it is that the shape of the request flips the local ordering of the trailing two models.

Where each model earns its premium

Cheapest does not mean best. The benchmark sheets explain what you are paying extra for when you do not pick Gemini.

Benchmark	GPT-5.5	Opus 4.7	Gemini 3.1 Pro
SWE-bench Verified (coding)	~74%	87.6%	80.6%
MMLU-Pro (knowledge)	83.2%	~82%	75.8%
GPQA Diamond (graduate science)	93.6%	94.2%	94.3%
MRCR v2 (long-context retrieval)	74.0% @ 1M	32.2% @ 1M	84.9% @ 128K

Three different leaders. Opus owns coding by a wide margin, both versus the other two and versus its own predecessor; the SWE-bench Verified jump from 79.4% on Opus 4.6 to 87.6% on 4.7 is the largest generational gain Anthropic has shipped on coding since Sonnet 3.5. GPT-5.5 owns knowledge breadth and the only credible 1M-token retrieval score in the bracket. Gemini owns short-window retrieval, science, and the price sheet.

Note the long-context regression on Opus 4.7. Anthropic published 78.3% on MRCR v2 for 4.6 and 32.2% for 4.7. Same benchmark, same harness, large drop. Several community runs have replicated it. Pair that with the 1.0-1.35x tokenizer inflation and Opus 4.7 is structurally a worse choice for long-context work than Opus 4.6 was, on top of being more expensive.

The 1M calls per month math

Same casual-coding shape, scaled to a million calls per month. Order-of-magnitude estimate, useful for budget conversations, not for a precise forecast.

Model	Per call	1M calls / month	Annualized
GPT-5.5	$0.55	$550,000	$6.6M
Opus 4.7 (sticker)	$0.50	$500,000	$6.0M
Opus 4.7 (with +25% tokenizer)	$0.625	$625,000	$7.5M
Gemini 3.1 Pro	$0.22	$220,000	$2.6M

$4.9 million annualized between GPT-5.5 and Gemini at the same volume. $4.0 million even after applying the Opus tokenizer correction. At that scale, the benchmark gap on a specific task type either pays for itself or it does not, and the answer matters in the millions. Run the eval before you sign the procurement contract.

Routing the request, not the procurement contract

For coding-heavy work where SWE-bench class quality dominates the decision, Opus 4.7 is worth the tokenizer tax (and the long-context regression makes it a bad fit only if your prompts run past 200K). For knowledge breadth or 1M-token retrieval, GPT-5.5 earns its sticker, but plan the 272K cliff into your context budget and cap sessions below it where you can.

For everything else, including the boring 80 percent of production traffic, Gemini 3.1 Pro is more than half the price across every workload size and competitive on the benchmarks that aren't named SWE-bench. The 200K tier flip is a real cost event but the post-flip price is still under what either competitor charges below the line.

If you cannot decide, route by request shape: small requests to Gemini, coding requests to Opus, and 1M-context retrieval to GPT-5.5. The hybrid stack costs less than committing to any single one of them, which is what most teams running real production load are quietly doing already.

Sources

OpenAI API pricing - GPT-5.5 list rate, 272K surcharge documentation, cached input pricing
Vellum: Everything you need to know about GPT-5.5 - benchmark roundup, surcharge mechanics, MRCR v2 numbers
Artificial Analysis: GPT-5.5 leads the Intelligence Index - MMLU-Pro and intelligence index leaderboard
Anthropic pricing - Opus 4.7 list rate, batch and cache pricing
Finout: Claude Opus 4.7 pricing - the real cost story - 1.0-1.35x tokenizer inflation analysis, real-world bill data
Vellum: Claude Opus 4.7 benchmarks explained - SWE-bench Verified 87.6%, MRCR v2 regression detail
Google Gemini API pricing - Gemini 3.1 Pro Preview rates and 200K tier flip
ofox.ai: GPT-5.5 vs Claude Opus vs Gemini 3.1 flagship comparison - cross-check on benchmark and pricing data

Compare all model prices Calculate your API cost