GPT-5.5 doubled the per-token price. The 40% efficiency claim does not get OpenAI back to even.
OpenAI shipped GPT-5.5 on April 23 at exactly 2x GPT-5.4 prices and pitched it as a wash because the model uses fewer tokens. The 40% figure shows up everywhere but traces back to a third-party benchmark, not OpenAI's own page. Run the math on three real workloads and the math does not balance. Here is where GPT-5.5 still wins, where GPT-5.4 is the rational choice, and the long-context surcharge nobody is talking about.

Photo by Logan Voss on Unsplash
Headline numbers first. GPT-5.5 lists at $5 per million input and $30 per million output, with cached input at $0.50. That is exactly 2x GPT-5.4's rate card. The 1.05M context window technically holds, but anything past 272K input flips the session into a higher SKU. GPT-5.5 Pro stayed flat at $30/$180. The hike only landed on Standard, which is the SKU most people actually use.
What doubled, what stayed the same
The Standard tier moved. The Pro tier did not. That asymmetry says something about who OpenAI thinks is paying for what.
| Tier | Input / 1M | Cached / 1M | Output / 1M | Notes |
|---|---|---|---|---|
| GPT-5.4 Standard | $2.50 | $0.25 | $15.00 | Previous flagship |
| GPT-5.5 Standard | $5.00 | $0.50 | $30.00 | 2x across the board |
| GPT-5.5 Standard (>272K input) | $10.00 | $1.00 | $45.00 | Long-context session surcharge |
| GPT-5.5 Pro | $30.00 | - | $180.00 | Same as GPT-5.4 Pro |
| GPT-5.5 Standard (Batch) | $2.50 | - | $15.00 | 50% off, 24h SLA |
| GPT-5.5 Standard (Priority) | $12.50 | - | $75.00 | 2.5x for guaranteed latency |
Pro tier holding flat is the tell. People paying $180 per million output tokens are already in a workload regime where token volume is small and quality is everything. Standard tier is where the bulk usage lives - chat apps, agent loops, RAG backends - and that is where OpenAI moved the lever.
About that 40% efficiency number
Every secondary writeup of the GPT-5.5 launch repeats some version of "uses roughly 40% fewer output tokens to complete the same Codex task." We went looking for the source.
OpenAI's own announcement page does not state a percentage. The closest direct quote is from Greg Brockman, paraphrased by TechCrunch: "a faster, sharper thinker for fewer tokens compared to something like 5.4." No number, no benchmark name. Artificial Analysis ran GPT-5.5 through their Intelligence Index workload and reported a roughly 40% reduction in output tokens versus 5.4 - that appears to be the actual origin of the 40% figure that everyone is now citing as an OpenAI claim.
That distinction matters for two reasons. First, the 40% is measured on AA's test suite, not on Codex specifically, even though it gets repeated as a Codex-task number. Second, when a third-party benchmark and a vague vendor claim converge on the same headline figure across different testbeds, the most likely explanation is that one is being read into the other. The directional finding - 5.5 emits fewer tokens for equivalent work - is solid. The exact percentage is not.
Three workloads, two models, one math problem
Take the optimistic version of OpenAI's pitch and assume GPT-5.5 emits 40% fewer output tokens than GPT-5.4 for the same work. Hold input tokens constant (no one has claimed input compresses). Run three workload shapes a tokencost reader actually has:
| Workload | GPT-5.4 | GPT-5.5 (raw) | GPT-5.5 (40% fewer out) | Net change |
|---|---|---|---|---|
| Support reply: 4K in, 1K out | $0.025 | $0.050 | $0.038 | +52% |
| Doc-grounded answer: 100K in, 10K out | $0.40 | $0.80 | $0.68 | +70% |
| Codex loop: 1M in, 100K out | $4.00 | $14.50 | $12.70 | +218% |
| Long Codex run: 10M in, 1M out | $40.00 | $145.00 | $127.00 | +218% |
All figures use list standard pricing. Coding agent and long Codex rows include the 272K long-context surcharge, which kicks the GPT-5.5 input rate to $10/M and output to $45/M for the entire session.
Even granting OpenAI the full 40% output reduction the math does not balance. The chat case ends up costing 52% more. The coding-agent case more than triples, because input doubled, output doubled, then both rates doubled again past 272K. The 40% efficiency only nibbles at the output side of a problem that has two multipliers stacking on top.
The break-even is sharper than the headline suggests. To make GPT-5.5 cost the same as GPT-5.4 on a typical 1:10 input/output workload at standard pricing, you need roughly 50% combined token reduction across input and output, applied consistently. That requires the model to also reduce input tokens (which OpenAI has not claimed) or to cut down agent iteration counts (which depends on your scaffold, not the model).
The 272K cliff most coverage missed
OpenAI advertised GPT-5.5 with a 1.05M context window. What the docs page mentions in passing is that any session whose input crosses 272K tokens flips to a different SKU pricing for the duration of that session. Input becomes $10 per million, output becomes $45 per million, cached input becomes $1 per million.
That is 4x the input cost of GPT-5.4, not 2x. And it applies across the whole session, not just the part above 272K. A coding agent that spends most of its life under 272K and occasionally tips over - because someone pasted a long file, or the conversation got long - pays the surcharge from the moment of crossing until the session ends.
Practical implication: if you are building anything that traffics in large contexts, treat 272K as the actual cap on the standard rate. Above that, the cost-per-correct-output is materially worse than Gemini 3 Pro's tiered model (which steps from $2 to $4 input above 200K and from $12 to $18 output) or Claude Opus 4.7's flat $5/$25 rate that does not punish the long context after you got there.
Where the price hike is actually worth paying
The cost analysis is necessary but not sufficient. There are three workload shapes where GPT-5.5 still pencils out, even at the doubled rate.
Codex-style agentic coding under 272K. This is the workload OpenAI optimized for and the place where token reduction matches their claim closely. SWE-Bench Pro at 58.6 beats GPT-5.4's 53.x range. Terminal-Bench 2.0 at 82.7 is well above anything 5.4 produced. If your agent loop fits under 272K and you measure success in PRs landed per dollar, 5.5 looks defensible. The headline win is in iteration count, not just per-call cost.
FrontierMath Tier 4 and similar. GPT-5.5 jumped from 27.1 to 35.4. If you are paying for the top half-percent of math/research output, that delta is worth more than the price doubling because nothing else competes at this tier. This is also where Pro (which did not change price) is the right SKU.
Anything that previously required GPT-5.5-Pro's reasoning depth. If a workload was being served by Pro at $30/$180, GPT-5.5 Standard at $5/$30 might now be capable enough to displace it. That is a 6x reduction, not an increase. The release note explicitly positions 5.5 Standard as more capable than 5.4 Pro for many non-Pro tasks.
What else $5/$30 buys you in May 2026
Same token bill, different vendors, May 4, 2026:
| Model | Input / 1M | Output / 1M | Context | SWE-Bench Pro |
|---|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | 1.05M* | 58.6 |
| Claude Opus 4.7 | $5.00 | $25.00 | 200K | 64.3 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M (beta retired Apr 30) | ~55 |
| Gemini 3.1 Pro | $2.00 / $4.00 | $12.00 / $18.00 | 1M (tiered) | ~56 |
| Grok 4.3 | $1.25 | $2.50 | 1M (2x >200K) | ~52 |
| DeepSeek V4 Pro (promo) | $0.435 | $0.87 | 1M | ~57 |
*GPT-5.5 nominal context. Input over 272K shifts the entire session into the long-context tier ($10 input / $45 output). DeepSeek V4 Pro promo runs through May 31, then reverts to $1.74/$3.48. SWE-Bench Pro figures from public leaderboards and provider system cards as of May 4, 2026.
Two things stand out. Opus 4.7 matches GPT-5.5 on input price, undercuts on output, and beats it by ~6 points on SWE-Bench Pro. Gemini 3.1 Pro is meaningfully cheaper and still in the same SWE-Bench Pro neighborhood, with a tiered context model that is gentler than GPT-5.5's cliff. The price doubling did not buy OpenAI a benchmark lead it did not already have - it pulled the rate card up to match what Anthropic was already charging while staying behind on the headline agentic coding number.
Pay double, or don't
Three buckets, three different answers. A Codex loop that lives under 272K and burns ~10x more output than input is the workload OpenAI built 5.5 for. The token reduction sticks closest there, the SWE-Bench Pro gain over 5.4 is real, and PRs landed per dollar can move the right way after the upgrade. Move it.
For doc-grounded answers, support chat, or anything weighted toward big input and modest output, the math says stay on GPT-5.4 until it gets deprecated. Or move sideways to Sonnet 4.6 or Gemini 3.1 Pro and pick up materially better cost-per-quality at the same tier without doubling the bill.
Anything that routinely crosses 272K is on the wrong rate card with GPT-5.5. Opus 4.7 with chunking has flat $5/$25 pricing that does not punish you mid-session. Gemini 3.1 Pro's tiered model is softer than the OpenAI cliff. Both will land cheaper for the same quality bar in long-context territory.
Sources
- Introducing GPT-5.5 - OpenAI announcement, April 23, 2026
- OpenAI API pricing documentation - developers.openai.com
- GPT-5.5 model page - includes long-context surcharge details
- OpenAI's GPT-5.5 is the new leading AI model - Artificial Analysis (origin of the 40% figure)
- OpenAI releases GPT-5.5 - TechCrunch, April 23, 2026 (Brockman quote)
- Claude Opus 4.7 release notes - Anthropic, for competitor pricing comparison
- Gemini 3 developer guide - Google AI, for tiered context pricing