How much does GPT-5.4 computer use cost per task?

A simple 5-turn web lookup costs roughly $0.03. Form filling (8 turns) runs about $0.09. A 15-turn data extraction task costs around $0.32. Long sessions that cross the 272K input threshold see input rates double retroactively from $2.50 to $5.00 per million tokens.

What is GPT-5.4's 272K token pricing cliff?

GPT-5.4 charges $2.50 per million input tokens for sessions under 272K tokens, and $5.00 per million for sessions over that threshold. The higher rate applies retroactively to the entire session. A 300K input / 20K output session costs $1.95 at long-context rates, vs $0.93 at standard rates.

Is Claude cheaper than GPT-5.4 for computer use?

Claude Haiku 4.5 at $1.00/M input and $5.00/M output is cheaper than GPT-5.4 for high-volume automation. Claude Sonnet 4.6 at $3.00/M input costs slightly more per token but has no pricing cliff. Gemini 2.5 Computer Use Preview at $1.25/M input is cheapest but still in preview as of April 2026.

How many tokens does a screenshot use in GPT-5.4 computer use?

At 1280x720 resolution, each screenshot costs roughly 1,200 tokens. At 1024x768 (OpenAI's recommended resolution for computer use), it runs about 1,050 tokens. Using 1920x1080 costs 4-5x more tokens per screenshot with no measurable improvement in click accuracy for most tasks.

GuideApril 14, 2026·8 min read

GPT-5.4 computer use: what a real agent task actually costs

Computer use sessions are input-heavy by design. Screenshots compound with conversation history on every turn, GPT-5.4 has a pricing cliff at 272K tokens that retroactively doubles your input bill, and the $2.50/M baseline rate tells you less than you think. Here is the cost breakdown for real tasks.

Developer working at multiple monitors in a dark room with code and AI output on screen

Photo by Abu Saeid on Unsplash

GPT-5.4 charges $2.50 per million input tokens for computer use -- no premium over the standard API. The catch is that computer use sessions are input-heavy by nature: each turn carries the full conversation history plus a new screenshot. A 5-turn web lookup costs about $0.03. A 15-turn data extraction task runs around $0.32. And if your session crosses GPT-5.4's 272K input threshold -- which a long agentic workflow can do around turn 45-50 -- the input rate doubles to $5.00/M retroactively.

Why screenshots cost more than they look

Computer use works in a loop. You send a task, the model returns an action (click, type, scroll), your code executes it, takes a screenshot, and sends it back. Repeat until done. The screenshots are what drives the cost -- but not for the reason you might expect.

At 1280x720, each screenshot runs roughly 1,200 tokens in GPT-5.4's 32-pixel patch system. That's about $0.003 per frame at $2.50/M. Cheap. The problem is that the full conversation history stays in context for every subsequent turn. By turn 10, you're sending around 15,000 tokens of context per call. By turn 20, it's closer to 35,000. The screenshots are just the part that keeps compounding.

OpenAI recommends 1024x768 for computer use, and they say it explicitly in their computer use guide. That resolution runs about 1,050 tokens per screenshot. Switching to 1920x1080 -- the obvious default for a desktop environment -- costs 4-5x more tokens per frame with no improvement in click accuracy for most UI tasks. That single setting can take a meaningful bite out of the bill on anything longer than a handful of turns.

Reasoning effort compounds things further. GPT-5.4 supports five levels: none, low, medium, high, and xhigh. At high or xhigh, the model generates substantially more reasoning tokens on top of the base cost -- roughly 3-5x more per call. Most computer use tasks run fine at medium. High is worth testing on tasks with ambiguous UI or multi-step planning, but it should be a deliberate choice.

What typical tasks actually cost

These estimates use 1024x768 resolution and medium reasoning effort. Token counts are derived from OpenAI's computer use documentation and third-party session analysis. Real costs vary with page content length, tool call frequency, and how much text the model extracts from the UI. When we ran a 20-turn research session against a multi-page procurement portal, total input came in around 130K tokens -- well under the 272K threshold, but roughly 4x higher than we initially expected before accounting for context accumulation.

Task	Turns	Approx. tokens	Cost (GPT-5.4)
Web lookup (navigate + read)	5	~5K in / ~1K out	~$0.03
Form fill (3-5 fields)	8	~20K in / ~3K out	~$0.09
Data extraction (scrape + format)	15	~80K in / ~8K out	~$0.32
Research workflow (multi-page)	30	~200K in / ~15K out	~$0.73
Enterprise automation (50+ steps)	50+	>300K in (cliff risk)	$1.50-5+ (see below)

Estimates at 1024x768 and medium reasoning. Source: OpenAI computer use guide

The 272K cliff

GPT-5.4 has a 1,050,000-token context window. Most sessions never come close. But there's a pricing threshold at 272K input tokens that matters more than the total context size: cross it, and your input rate doubles from $2.50 to $5.00 per million. Output jumps from $15.00 to $22.50. Both rates apply retroactively to the entire session -- not just to the tokens over the line.

The math is abrupt. A 300K input / 20K output session costs $1.95 at long-context rates. Trim it to 250K input and 20K output -- 50K tokens less -- and the same session costs $0.93. You either pay $2.50/M or you pay $5.00/M. There's no gradual slope.

At 1024x768 with typical context buildup, sessions generate roughly 5,000-7,000 input tokens per turn. That puts the 272K threshold around turn 40-50 for most tasks. Short workflows are well clear of it. A research agent that reads several full pages, processes their content, and builds a structured output can get there.

The practical fix is context checkpointing. At natural task milestones -- after completing a phase, after extracting a data batch -- summarize what you have, discard the old screenshots, and start a fresh session for the next phase. Each checkpoint resets token accumulation. It adds some pipeline complexity but keeps sessions in the cheap tier and makes debugging easier as a side effect.

GPT-5.4 vs Claude vs Gemini for computer use

All three major providers offer computer use APIs. The pricing differences are real and the tradeoffs go beyond cost per token.

Model	Input / 1M	Output / 1M	OSWorld	Notes
GPT-5.4	$2.50	$15.00	75.0%	Doubles to $5/$22.50 over 272K input
GPT-5.4-mini	$0.75	$4.50	-	No OSWorld published; good for predictable tasks
Claude Sonnet 4.6	$3.00	$15.00	72.5%	No pricing cliff; +$0.08/session-hr (managed agents)
Claude Opus 4.6	$5.00	$25.00	72.7%	+$0.08/session-hr (managed agents)
Claude Haiku 4.5	$1.00	$5.00	-	Cheapest; best for structured, predictable automation
Gemini 2.5 CU Preview	$1.25	$10.00	69.2%	Preview only as of April 2026; doubles over 200K

Sources: OpenAI pricing · Anthropic pricing · Google AI pricing

Claude doesn't have a cliff like GPT-5.4's 272K threshold. Sonnet 4.6 bills at $3.00/M input flat -- slightly pricier than GPT-5.4 for short sessions, but no retroactive doubling risk on long ones. That difference matters if your workflows vary in length or are hard to predict at design time.

Gemini 2.5 Computer Use Preview looks good on price, but "preview" carries real operational risk -- rate limits, API changes, and no production SLA. It's worth benchmarking now. Putting it in a production pipeline today is a different call.

What the OSWorld score actually tells you

GPT-5.4 posts the best published OSWorld-Verified score at 75.0%, against Claude Opus 4.6 at 72.7% and Sonnet 4.6 at 72.5%. OSWorld measures whether an agent successfully completes real UI tasks -- forms, file operations, application navigation -- so it is a more relevant benchmark for computer use than most.

Whether a 2-3 percentage point gap matters depends on what failure costs. At 10,000 tasks per month, GPT-5.4 at 75% completes around 750 more tasks than a model at 67.5%. That may be worth the input cost premium. For a pipeline that runs once a day and retries at low cost, the benchmark gap probably is not the variable that matters most.

OSWorld is also diverse by design -- it covers many application types and OS contexts. If your agent spends most of its time in one specific application, the real-world performance gap could be larger or smaller than the headline number. Task-specific evaluation on your actual workflow is worth doing before committing to a provider based on benchmark scores alone.

Four settings worth checking before you scale

Set resolution to 1024x768. OpenAI says it in their docs. At 1920x1080, you pay 4-5x more tokens per screenshot with no accuracy gain on standard UI tasks.

Run reasoning at medium, not high. High and xhigh generate 3-5x more reasoning tokens. Start at medium and only increase if task quality actually requires it.

Checkpoint context at task milestones. Summarize completed steps and start a fresh session for the next phase. Sessions under 272K pay $2.50/M input instead of $5.00/M.

Try gpt-5.4-mini on routine steps. At $0.75/M input and $4.50/M output, it's 70% cheaper on input than the standard model. Form filling, predictable navigation, and structured data entry are the tasks most worth testing first.

Sources

GPT-5.4 API pricing - OpenAI Developer Documentation
Computer use guide - OpenAI Developer Documentation
Claude API pricing - Anthropic Documentation
Claude computer use tool reference - Anthropic Documentation
Gemini API pricing - Google AI for Developers

Compare all model prices Calculate your API cost