How much more does GPT-5.5 cost than GPT-5.4?

Exactly twice as much on both directions. GPT-5.5 lists at $5 per million input tokens and $30 per million output tokens, versus GPT-5.4 at $2.50 input and $15 output. Cached input doubled too, from $0.25 to $0.50 per million. Pro tier is unchanged at $30 in / $180 out for both generations.

Does GPT-5.5 really use 40% fewer tokens?

OpenAI claims fewer tokens for equivalent Codex tasks. The widely repeated 40% figure traces back to Artificial Analysis measuring on its own Intelligence Index workload, not OpenAI publishing the number directly. Greg Brockman called it 'a faster, sharper thinker for fewer tokens compared to something like 5.4' without putting a percentage on it. Treat 40% as directional, not load-bearing.

When is GPT-5.5 actually cheaper than GPT-5.4?

Almost never on raw token math. Even applying a full 40% output token reduction, GPT-5.5 still costs about 1.2x to 3.2x more than GPT-5.4 across the workloads we modeled. The break-even requires roughly 60% combined token reduction across input and output, which only shows up in narrow Codex-style agent loops where the model also runs for fewer turns.

What is the GPT-5.5 long-context surcharge?

Inputs over 272K tokens trigger a 2x input rate and 1.5x output rate for the entire session. So a 1M input call jumps from $5 to $10 per million input, and outputs in that session move from $30 to $45 per million. The full 1.05M context window is technically available, but the back two-thirds of it is priced as a different SKU.

ResearchMay 4, 2026·10 min read

GPT-5.5 doubled the per-token price. The 40% efficiency claim does not get OpenAI back to even.

OpenAI shipped GPT-5.5 on April 23 at exactly 2x GPT-5.4 prices and pitched it as a wash because the model uses fewer tokens. The 40% figure shows up everywhere but traces back to a third-party benchmark, not OpenAI's own page. Run the math on three real workloads and the math does not balance. Here is where GPT-5.5 still wins, where GPT-5.4 is the rational choice, and the long-context surcharge nobody is talking about.

Glowing blue and purple data lines representing GPT-5.5 token throughput cost comparison

Photo by Logan Voss on Unsplash

Headline numbers first. GPT-5.5 lists at $5 per million input and $30 per million output, with cached input at $0.50. That is exactly 2x GPT-5.4's rate card. The 1.05M context window technically holds, but anything past 272K input flips the session into a higher SKU. GPT-5.5 Pro stayed flat at $30/$180. The hike only landed on Standard, which is the SKU most people actually use.

What doubled, what stayed the same

The Standard tier moved. The Pro tier did not. That asymmetry says something about who OpenAI thinks is paying for what.

Tier	Input / 1M	Cached / 1M	Output / 1M	Notes
GPT-5.4 Standard	$2.50	$0.25	$15.00	Previous flagship
GPT-5.5 Standard	$5.00	$0.50	$30.00	2x across the board
GPT-5.5 Standard (>272K input)	$10.00	$1.00	$45.00	Long-context session surcharge
GPT-5.5 Pro	$30.00	-	$180.00	Same as GPT-5.4 Pro
GPT-5.5 Standard (Batch)	$2.50	-	$15.00	50% off, 24h SLA
GPT-5.5 Standard (Priority)	$12.50	-	$75.00	2.5x for guaranteed latency

Pro tier holding flat is the tell. People paying $180 per million output tokens are already in a workload regime where token volume is small and quality is everything. Standard tier is where the bulk usage lives - chat apps, agent loops, RAG backends - and that is where OpenAI moved the lever.

About that 40% efficiency number

Every secondary writeup of the GPT-5.5 launch repeats some version of "uses roughly 40% fewer output tokens to complete the same Codex task." We went looking for the source.

OpenAI's own announcement page does not state a percentage. The closest direct quote is from Greg Brockman, paraphrased by TechCrunch: "a faster, sharper thinker for fewer tokens compared to something like 5.4." No number, no benchmark name. Artificial Analysis ran GPT-5.5 through their Intelligence Index workload and reported a roughly 40% reduction in output tokens versus 5.4 - that appears to be the actual origin of the 40% figure that everyone is now citing as an OpenAI claim.

That distinction matters for two reasons. First, the 40% is measured on AA's test suite, not on Codex specifically, even though it gets repeated as a Codex-task number. Second, when a third-party benchmark and a vague vendor claim converge on the same headline figure across different testbeds, the most likely explanation is that one is being read into the other. The directional finding - 5.5 emits fewer tokens for equivalent work - is solid. The exact percentage is not.

Three workloads, two models, one math problem

Take the optimistic version of OpenAI's pitch and assume GPT-5.5 emits 40% fewer output tokens than GPT-5.4 for the same work. Hold input tokens constant (no one has claimed input compresses). Run three workload shapes a tokencost reader actually has:

Workload	GPT-5.4	GPT-5.5 (raw)	GPT-5.5 (40% fewer out)	Net change
Support reply: 4K in, 1K out	$0.025	$0.050	$0.038	+52%
Doc-grounded answer: 100K in, 10K out	$0.40	$0.80	$0.68	+70%
Codex loop: 1M in, 100K out	$4.00	$14.50	$12.70	+218%
Long Codex run: 10M in, 1M out	$40.00	$145.00	$127.00	+218%

All figures use list standard pricing. Coding agent and long Codex rows include the 272K long-context surcharge, which kicks the GPT-5.5 input rate to $10/M and output to $45/M for the entire session.

Even granting OpenAI the full 40% output reduction the math does not balance. The chat case ends up costing 52% more. The coding-agent case more than triples, because input doubled, output doubled, then both rates doubled again past 272K. The 40% efficiency only nibbles at the output side of a problem that has two multipliers stacking on top.

The break-even is sharper than the headline suggests. To make GPT-5.5 cost the same as GPT-5.4 on a typical 1:10 input/output workload at standard pricing, you need roughly 50% combined token reduction across input and output, applied consistently. That requires the model to also reduce input tokens (which OpenAI has not claimed) or to cut down agent iteration counts (which depends on your scaffold, not the model).

The 272K cliff most coverage missed

OpenAI advertised GPT-5.5 with a 1.05M context window. What the docs page mentions in passing is that any session whose input crosses 272K tokens flips to a different SKU pricing for the duration of that session. Input becomes $10 per million, output becomes $45 per million, cached input becomes $1 per million.

That is 4x the input cost of GPT-5.4, not 2x. And it applies across the whole session, not just the part above 272K. A coding agent that spends most of its life under 272K and occasionally tips over - because someone pasted a long file, or the conversation got long - pays the surcharge from the moment of crossing until the session ends.

Practical implication: if you are building anything that traffics in large contexts, treat 272K as the actual cap on the standard rate. Above that, the cost-per-correct-output is materially worse than Gemini 3 Pro's tiered model (which steps from $2 to $4 input above 200K and from $12 to $18 output) or Claude Opus 4.7's flat $5/$25 rate that does not punish the long context after you got there.

Where the price hike is actually worth paying

The cost analysis is necessary but not sufficient. There are three workload shapes where GPT-5.5 still pencils out, even at the doubled rate.

Codex-style agentic coding under 272K. This is the workload OpenAI optimized for and the place where token reduction matches their claim closely. SWE-Bench Pro at 58.6 beats GPT-5.4's 53.x range. Terminal-Bench 2.0 at 82.7 is well above anything 5.4 produced. If your agent loop fits under 272K and you measure success in PRs landed per dollar, 5.5 looks defensible. The headline win is in iteration count, not just per-call cost.

FrontierMath Tier 4 and similar. GPT-5.5 jumped from 27.1 to 35.4. If you are paying for the top half-percent of math/research output, that delta is worth more than the price doubling because nothing else competes at this tier. This is also where Pro (which did not change price) is the right SKU.

Anything that previously required GPT-5.5-Pro's reasoning depth. If a workload was being served by Pro at $30/$180, GPT-5.5 Standard at $5/$30 might now be capable enough to displace it. That is a 6x reduction, not an increase. The release note explicitly positions 5.5 Standard as more capable than 5.4 Pro for many non-Pro tasks.

What else $5/$30 buys you in May 2026

Same token bill, different vendors, May 4, 2026:

Model	Input / 1M	Output / 1M	Context	SWE-Bench Pro
GPT-5.5	$5.00	$30.00	1.05M*	58.6
Claude Opus 4.7	$5.00	$25.00	200K	64.3
Claude Sonnet 4.6	$3.00	$15.00	1M (beta retired Apr 30)	~55
Gemini 3.1 Pro	$2.00 / $4.00	$12.00 / $18.00	1M (tiered)	~56
Grok 4.3	$1.25	$2.50	1M (2x >200K)	~52
DeepSeek V4 Pro (promo)	$0.435	$0.87	1M	~57

*GPT-5.5 nominal context. Input over 272K shifts the entire session into the long-context tier ($10 input / $45 output). DeepSeek V4 Pro promo runs through May 31, then reverts to $1.74/$3.48. SWE-Bench Pro figures from public leaderboards and provider system cards as of May 4, 2026.

Two things stand out. Opus 4.7 matches GPT-5.5 on input price, undercuts on output, and beats it by ~6 points on SWE-Bench Pro. Gemini 3.1 Pro is meaningfully cheaper and still in the same SWE-Bench Pro neighborhood, with a tiered context model that is gentler than GPT-5.5's cliff. The price doubling did not buy OpenAI a benchmark lead it did not already have - it pulled the rate card up to match what Anthropic was already charging while staying behind on the headline agentic coding number.

Pay double, or don't

Three buckets, three different answers. A Codex loop that lives under 272K and burns ~10x more output than input is the workload OpenAI built 5.5 for. The token reduction sticks closest there, the SWE-Bench Pro gain over 5.4 is real, and PRs landed per dollar can move the right way after the upgrade. Move it.

For doc-grounded answers, support chat, or anything weighted toward big input and modest output, the math says stay on GPT-5.4 until it gets deprecated. Or move sideways to Sonnet 4.6 or Gemini 3.1 Pro and pick up materially better cost-per-quality at the same tier without doubling the bill.

Anything that routinely crosses 272K is on the wrong rate card with GPT-5.5. Opus 4.7 with chunking has flat $5/$25 pricing that does not punish you mid-session. Gemini 3.1 Pro's tiered model is softer than the OpenAI cliff. Both will land cheaper for the same quality bar in long-context territory.

Sources

Introducing GPT-5.5 - OpenAI announcement, April 23, 2026
OpenAI API pricing documentation - developers.openai.com
GPT-5.5 model page - includes long-context surcharge details
OpenAI's GPT-5.5 is the new leading AI model - Artificial Analysis (origin of the 40% figure)
OpenAI releases GPT-5.5 - TechCrunch, April 23, 2026 (Brockman quote)
Claude Opus 4.7 release notes - Anthropic, for competitor pricing comparison
Gemini 3 developer guide - Google AI, for tiered context pricing

Run your own GPT-5.5 vs 5.4 math Compare GPT-5.5 vs Opus 4.7