Grok Build runs coding agents at $1 in, $2 out. The catch is that xAI published zero benchmarks.
xAI shipped its own terminal coding agent on May 14 and quietly listed the model behind it, grok-build-0.1, in the API on May 20. The rate is $1.00 input and $2.00 output per million tokens, which undercuts OpenAI's Codex model and lands at roughly a seventh of what Claude Opus 4.7 charges on output. The price is the easy part to write about. The hard part is that xAI shipped it with no SWE-Bench number, no coding eval, nothing. You are being asked to route real work to a model on price alone.

Photo by Mohammad Rahmani on Unsplash
The short of it
- Grok Build is the CLI (announced May 14, sold through a SuperGrok subscription). grok-build-0.1 is the API model behind it, listed May 20 at $1.00 in, $0.20 cached, $2.00 out, 256K context.
- A 500K-input, 80K-output coding task runs $0.66 on it, against $2.00 on OpenAI's Codex model and $4.50 on Claude Code with Opus 4.7. For agents, output price is the whole game.
- xAI published no benchmarks. None at all. The cheap rate comes with a real question mark on quality.
Grok Build the CLI is not grok-build-0.1 the model
Most of the coverage conflates these, so it is worth pulling them apart before any pricing makes sense. On May 14 xAI announced Grok Build, a command-line coding agent. It plans before it edits, shows clean diffs, runs subagents in parallel git worktrees, has a headless mode for automation, and reads your existing AGENTS.md, MCP servers, hooks, and skills without conversion. If that description sounds like Claude Code or the Codex CLI, that is the point. xAI built it to be a drop-in for developers already using one of those.
The CLI itself is distributed through a subscription, not metered per token. Press coverage puts the SuperGrok Heavy tier that includes it around $299 a month, with a discounted intro rate, but those figures come from reporters, not from xAI's own page, so treat them as approximate.
Six days later, on May 20, the model that powers it showed up in the pay-as-you-go API as grok-build-0.1. That is the one with a rate card, and the one that matters if you are wiring it into your own agent instead of using xAI's CLI. One odd detail: the model page lists grok-build-0.1 with grok-code-fast-1 aliases, which means it is the evolution of xAI's existing coding line rather than a from-scratch first model. Do not let anyone tell you it is xAI's first coding model. It is the rebrand of one they already had.
The rate card next to its rivals
Here is what the model behind each major coding agent costs per million tokens, pulled from each provider's own pricing page.
| Model | Input / 1M | Cached / 1M | Output / 1M | Context |
|---|---|---|---|---|
| grok-build-0.1 | $1.00 | $0.20 | $2.00 | 256K |
| grok-code-fast (predecessor) | $0.20 | — | $1.50 | 256K |
| gpt-5.3-codex | $1.75 | $0.175 | $14.00 | 272K |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 | 1M |
| Claude Opus 4.7 | $5.00 | $0.50 | $25.00 | 1M |
Two things stand out. First, on output, the number that drives coding-agent bills, grok-build-0.1 at $2.00 is 7x under gpt-5.3-codex and 12.5x under Opus 4.7. Second, xAI undercut itself in a confusing way: the older grok-code-fast model it is built on is cheaper on both input and output. If raw cost is all you care about and you do not need whatever grok-build-0.1 added, the predecessor is still the bargain.
What a real coding task costs
Rate cards lie about coding agents because the input-to-output ratio is so lopsided. An agent reads a lot and writes less, but the writes are where the expensive output tokens land. Take one concrete task: an agent ingests roughly 500K tokens of context (a medium repo, the relevant files, a few iterations of reading) and produces 80K tokens of output (a feature plus the diffs, tests, and explanation). Here is the bill on each.
| Coding agent | Model | Input cost | Output cost | Task total |
|---|---|---|---|---|
| Grok Build | grok-build-0.1 | $0.50 | $0.16 | $0.66 |
| OpenAI Codex | gpt-5.3-codex | $0.88 | $1.12 | $2.00 |
| Claude Code (Sonnet) | Sonnet 4.6 | $1.50 | $1.20 | $2.70 |
| Claude Code (Opus) | Opus 4.7 | $2.50 | $2.00 | $4.50 |
Grok Build comes in at $0.66. Codex bills three times that for the identical task, Claude Code on Sonnet a bit over four times, and Opus nearly seven. Run a hundred of those tasks a month and you are choosing between a $66 invoice and a $450 one on Opus. That is real money for a team running agents at volume, and it is the entire reason anyone will look at this model despite the missing benchmarks.
Caching widens the gap, then narrows it
Coding agents re-send the same context on every turn: the system prompt, the file tree, the files already in view. Prompt caching is what keeps that from bankrupting you. Take a heavier session, say 3M input tokens across forty turns with 70% landing as cache hits, plus 400K output. grok-build-0.1 runs about $2.12, gpt-5.3-codex about $7.54, and Opus 4.7 about $15.55. The output price still dominates, so Grok Build holds its lead.
One nuance worth flagging if you are input-bound rather than output-bound: on cache hits alone, gpt-5.3-codex at $0.175 is fractionally cheaper than grok-build-0.1 at $0.20, and Anthropic's 10x cache discount brings Opus reads down to $0.50. xAI's cache discount is shallower (5x, not 10x). So a workload that is almost all cached reads and very little generation closes the gap more than the headline numbers suggest. The further your task tilts toward output, the more Grok Build wins.
The number xAI did not publish
This is where the post would normally have a benchmark table. It does not, because xAI has not given us one. The official announcement has no scores. The model page has no scores. Benchmark trackers list grok-build-0.1 with zero sourced results. For a model whose entire pitch is coding, shipping with no SWE-Bench Verified number is a conspicuous silence.
You will see numbers floating around anyway. Ignore them unless they cite xAI directly. The only legitimate coding score in this lineage belongs to grok-code-fast-1, the predecessor grok-build-0.1 is aliased to, and that model sat in the 57 to 70 percent range on SWE-Bench Verified depending on the harness, well short of the high 80s that Opus 4.7 and GPT-5.5 post. If grok-build-0.1 is a meaningful step up from that, xAI has not said by how much.
We have not put grok-build-0.1 through our own harness yet, and that is rather the point: almost nobody has. The early hands-on writeups, like Kilo's teardown, are still feeling out where it breaks. So the honest read is this: a cheap coding model from a lab with a real track record, unproven on any public eval. Cheap and unproven is a fine bet for low-stakes, high-volume work where a wrong answer costs you a retry. It is a bad bet for the autonomous, merge-without-review agent you point at a production codebase. Until xAI publishes, the price tag is the only hard data point you have.
Who should actually switch
If you run a fleet of coding agents on bulk, forgiving work, such as mass refactors, test generation, doc updates, or first-draft PRs that a human reviews, the 3x to 7x cost gap is hard to ignore. Wire grok-build-0.1 into a router, send it the cheap-to-retry traffic, and keep your eval harness watching the diff quality. The downside of a bad output is a re-run, and you are paying a fraction per run.
If you are shipping autonomous agents against production code, or you need the last several points of SWE-Bench accuracy, stay where you are. Opus 4.7 and gpt-5.3-codex cost more because, on the evidence we actually have, they earn it on the hardest coding tasks. Grok Build has not shown it can hang there, and finding out the expensive way on a live repo is not worth saving a few dollars per task.
To model this against your own input/output mix, the calculator runs the math for any token split, and the pricing page lists every coding model side by side. For the broader xAI rate picture, see the Grok 4.3 Colossus 2 piece, and for the rival agents, the Claude Code vs Codex cost breakdown.
Sources
- xAI: Introducing Grok Build - May 14, 2026 CLI announcement, features, distribution
- xAI: grok-build-0.1 model page - 256K context, grok-code-fast-1 aliases, above-200K rate note
- xAI: API pricing - $1.00 input, $0.20 cached, $2.00 output; Grok 4.3 $1.25/$2.50
- OpenAI: API pricing - gpt-5.3-codex $1.75/$14.00, GPT-5.5 $5/$30
- Anthropic: Claude pricing - Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, 10x cache discount
- OpenRouter: grok-build-0.1 - May 20, 2026 API listing date corroboration