Skip to main content
TokenCost logoTokenCost
GuideJune 4, 2026·7 min read

GPT-5.5 on priority costs 2.5x list. Here is what the speed actually buys.

Most pricing talk fixates on the $5 and $30 sticker rate for GPT-5.5. There is a second set of numbers OpenAI quietly publishes next to it: $12.50 input and $75 output, the price of the priority serving tier. Same model, same answers, faster floor on latency, and a bill that runs two and a half times higher. We worked out when that trade is worth taking and when it is just a tax on tokens nobody is waiting for.

Night highway light trails streaking toward a city, illustrating GPT-5.5 priority processing speed

Photo by Marc-Olivier Jodoin on Unsplash

Priority processing is not a different model and it is not a quality upgrade. You flip one parameter on the request, service_tier="priority", and OpenAI routes that call through capacity it holds back for customers willing to pay for it. The tokens come out the same. They just come out faster, and with a floor under how slow they are allowed to get.

That floor is the whole product. On the standard tier, throughput is best-effort: most of the time it is quick, but under load your p50 latency drifts and you have no recourse. Priority puts a number on the guarantee. The question this post answers is whether the number on the invoice is worth the number on the clock.

What the premium actually covers

Microsoft documents the guarantee more precisely than OpenAI does. On Azure's priority tier, GPT-5.5 carries a latency target of 99% of requests sustaining more than 100 output tokens per second, measured as p50 over rolling five-minute windows. The older models on the same tier promise less: GPT-5.4, GPT-5.2, and GPT-5.1 each target 50 tokens per second, and GPT-4.1 sits at 80. So the newest flagship is also the one OpenAI is most confident it can serve fast.

OpenAI's own wording is vaguer, just "significantly lower and more consistent latency." If you want a contractual tokens-per-second figure to design against, the Azure docs are where it lives. Either way, the thing you are renting is the tail of the latency distribution, not the median. Standard is already fast when nothing is wrong. Priority is what you buy for the afternoons when everything is.

The price of speed, model by model

Four models have a published priority rate today. Three of them double. GPT-5.5 is the exception, and not in your favor: it charges a 2.5x premium where the rest charge 2x. Whatever OpenAI is doing to hit that 100 tokens-per-second target on its top model, it is passing the cost of it straight through.

ModelStd inPriority inStd outPriority outPremium
gpt-5.5$5.00$12.50$30.00$75.002.5x
gpt-5.4$2.50$5.00$15.00$30.002.0x
gpt-5.4-mini$0.75$1.50$4.50$9.002.0x
gpt-5.3-codex$1.75$3.50$14.00$28.002.0x

All rates per 1M tokens. Cached input scales the same way: GPT-5.5 goes from $0.50 to $1.25, also 2.5x. The Pro variants, gpt-5.4-nano, fine-tuned models, and embeddings have no priority listing at all, so those calls only ever run at standard speed.

What it adds up to per thousand calls

Priority is built for interactive traffic, so the right unit is a short request repeated a lot. Take a typical chat or agent turn at 40K input tokens and a 4K answer, and price a thousand of them. The delta column is the toll you pay purely for the faster tier.

ModelStandard / 1KPriority / 1KSpeed toll
gpt-5.5$320$800+$480
gpt-5.3-codex$126$252+$126
gpt-5.4$160$320+$160
gpt-5.4-mini$48$96+$48

On their own these look small. Scale them. A product serving 100,000 of those GPT-5.5 turns a month pays $32,000 on standard and $80,000 on priority. The $48,000 gap is not buying better output. It is buying the difference between a median response and a response you can put a service-level agreement around. For a consumer app where slow feels broken, that can be the best $48,000 you spend. For a nightly summarization job nobody is watching, it is $48,000 lit on fire.

Where priority quietly stops applying

The tier has hard edges, and a couple of them can bill you at a rate you did not expect. Worth knowing before you wire it into production:

  • Long context is out. OpenAI lists long context as unsupported, and Azure puts a number on it: any request estimated above 128K prompt tokens drops to standard processing and bills at the standard rate. Priority is a short-context product by design. If you also care about the over-272K surcharge on standard, that is a separate cliff we mapped here.
  • Fine-tuned models and embeddings get nothing. Neither is eligible. If your stack leans on a fine-tune, priority is simply not on the menu for those calls, whatever you set the parameter to.
  • Sudden ramps get downgraded. Push past 1M tokens per minute and then spike usage more than 50% inside fifteen minutes, and OpenAI may quietly route the overflow to standard. The response comes back tagged service_tier="default", which is your only signal that you got standard latency on a request you meant to fast-track.

The downgrade behavior is the one that stings, because you keep paying priority prices on everything that does land in the fast lane while the spillover silently runs standard. Read the service_tier field on every response if the guarantee is load-bearing for you.

Does a human notice the latency?

That is the whole decision, and it is close to binary. When a person is waiting on the tokens, the premium often pays for itself in retention you never see on a dashboard. Voice agents fall apart at 50 tokens per second and feel natural at 100, which is exactly the gap priority guarantees for GPT-5.5. Interactive coding assistants, live chat, anything where a user is staring at a cursor, all live in the same bucket. There, paying double to keep the p99 honest is a defensible line item.

Flip the question and the answer flips too. Background agents, batch enrichment, nightly report generation, anything where the result lands in a queue rather than in front of a person, has no business on priority. For those, the smart move is the opposite end of the menu: batch and flex run GPT-5.5 at $2.50 and $15, half the standard rate and a fifth of priority. Same tokens, a fifth of the bill, and a latency profile nobody will ever feel.

The mistake we see most is treating priority as a default rather than a decision. It is the most expensive way to call the model, by a wide margin, and it earns that price only on the slice of traffic where speed is the product. Route the rest somewhere cheaper.

Put a real number on the speed toll

Feed your actual request shape and monthly volume into the calculator, then compare the standard, batch, and priority rates side by side before you commit a single line of code to the fast lane. GPT-5.5 and every tier it ships with sit on the pricing page next to the models you might route to instead.

Common questions

What is OpenAI's priority processing tier?

It is a serving tier you opt into with service_tier=priority. The model and output are identical to standard; you pay more for lower, more consistent latency, with a tokens-per-second floor that the standard tier does not promise.

How much more does GPT-5.5 cost on priority?

$12.50 input and $75 output per million, versus $5 and $30 on standard. That is exactly 2.5x. Cached input goes from $0.50 to $1.25. The other three models with a priority price all sit at a flat 2x, so GPT-5.5 carries the steepest premium.

Can I use priority processing with long context?

No. Long context, fine-tuned models, and embeddings are unsupported. On Azure, any request estimated above 128K prompt tokens is downgraded to standard and billed at the standard rate. Priority is for short, latency-sensitive calls only.

Which models have a priority processing price?

Four as of June 2026: gpt-5.5 at 2.5x, plus gpt-5.4, gpt-5.4-mini, and gpt-5.3-codex each at 2x standard. Pro variants, nano, fine-tuned models, and embeddings have no published priority rate.

Sources