Model ReleaseMay 19, 2026·9 min read

xAI quietly swapped Grok 4.3's weights for Colossus-2-trained ones and switched on Agent Mode. Same endpoint, same $1.25 input, new build.

Musk posted on May 18 that the latest Grok is now running on weights trained partly on Colossus 2, with Agent Mode unlocked for API customers. The post does not name a new version. The xAI docs still point at grok-4.3. The price card still reads $1.25 input and $2.50 output per million tokens, the same as it has since April 30. On a 1M-token input run that comes out at roughly a quarter of what Opus 4.7 costs and one-seventh of GPT-5.5. The story here is the silent weight swap and the agent unlock, not a SKU bump or a price move.

Dark data center server racks representing xAI Colossus 2 supercomputer infrastructure

Photo by imgix on Unsplash

There is no Grok 4.21. The grok-4.3 endpoint just got new weights underneath it.

The first thing to clear up. Half the Twitter coverage of May 18 called it "Grok 4.21" or "the next Grok." The xAI docs disagree. The model ID is grok-4.3, exactly as it was on April 30 when the model went GA. The OpenRouter listing, the DashScope-equivalent xAI listing, and the LiteLLM canonical entry all still point at grok-4.3. What changed under the hood is the checkpoint behind that name. Musk's wording was that the current build was "partially trained on Colossus 2." That is a weights refresh shipped through the same SKU, not a new model.

Grok 4.4 (1T parameters) and Grok 5 (1.5T parameters) are on the xAI roadmap, but neither has shipped weights. The next time Musk posts about a numbered jump, that will be the actual new model. This was not it.

The other shipment in the same announcement is Agent Mode, which went from limited preview to general API availability. More on that further down. The takeaway right now is that everything in the rest of this post applies to one endpoint, grok-4.3, and one rate card.

The rate card has not moved since April 30

xAI did not touch the price. That is the thing worth pausing on, because it is the part of the announcement that pays.

Tier	Input / 1M	Output / 1M	Notes
grok-4.3 standard	$1.25	$2.50	1M context, flat
grok-4.3 cached input	$0.31	$2.50	~75% off input on cache hits
grok-4.3 batch (24h)	$0.63–$1.00	$1.25–$2.00	50–80% of standard
grok-4.20-multi-agent	$1.25	$2.50	2M context, legacy SKU
grok-4.1-fast	$0.20	$0.50	2M context, budget SKU

One subtle thing in this table. The 2M-context tier did not get the Colossus 2 retrain. It is the same grok-4.20-multi-agent checkpoint from March, still serving customers who need above 1M tokens of context in one shot. If you want the new weights, you are on grok-4.3 and capped at 1M.

What 1M tokens of input costs across the frontier

Real numbers. A 1M-token input with a 10K-token output (the shape of a whole-codebase analysis or a long brief), priced against every published frontier rate card in May 2026.

Model	Input / 1M	Output / 1M	Tiering above threshold	Total for 1M/10K
GPT-5.5	$5.00	$30.00	2x input / 1.5x output above 272K	$8.94
Claude Opus 4.7	$5.00	$25.00	Flat since March 13	$5.25
Gemini 3.1 Pro	$2.00	$12.00	2x input / 1.5x output above 200K	$3.78
Claude Sonnet 4.6	$3.00	$15.00	Flat, 1M context	$3.15
Grok 4.3 (Colossus 2 build)	$1.25	$2.50	Flat, 1M context	$1.28
DeepSeek V4-Pro (promo, until May 31)	$0.435	$0.87	Flat, 1M context	$0.44

On a 1M/10K run, Grok 4.3 costs $1.28. Opus 4.7 costs $5.25, or roughly 4.1x more. GPT-5.5 costs $8.94, or roughly 7x more. The only published rate that beats Grok is DeepSeek V4-Pro at promo pricing, and that promo ends May 31, after which V4-Pro snaps back to $1.74 input / $3.48 output. After June 1, Grok 4.3 becomes the cheapest frontier-tier API outright.

The leverage here is not the 4.1x against Opus. It is that none of the bigger labs have moved to match. Anthropic dropped the long-context surcharge in March. OpenAI brought GPT-5.5 Instant down as the default ChatGPT model. Google trimmed Flash Lite to $0.25 input. Nobody at the top of the frontier shelf has cut on the flagship line in 2026 except xAI.

Agent Mode is the part that actually changes what you can build

The Colossus 2 retrain is a quality bump. Agent Mode is a capability unlock. xAI is shipping a server-side agentic loop that plans, calls tools, runs them, and chains across turns without the developer needing to wire up an orchestrator. The same SKU. The same rate. Tool calls are billed as input and output tokens like any other turn.

The shape overlaps with OpenAI's Responses API and Anthropic's computer use plus tool use. Where xAI is different is that there is no separate price tier for agentic runs. GPT-5.5 priority and Pro tiers sit at $12.50/$75 and $30/$180 respectively, both meant for the kind of long-running agent traffic Agent Mode now serves on Grok at standard rates. Anthropic charges normally for tool use too, but with Opus 4.7 input at $5 the gap to Grok's $1.25 still pencils out at 4x cheaper before any agentic overhead.

The honest caveat. Agent Mode is new. The reliability data is not in. xAI's own τ²-Bench Telecom score for grok-4.3 is 98%, which is competitive with the agentic numbers GPT-5.5 and Opus 4.7 post, but τ²-Bench is one benchmark, not a production track record. If you are routing critical agentic traffic, run a shadow evaluation first.

Where Grok 4.3 lands on intelligence per dollar

Benchmarks for the Colossus 2 build, drawn from xAI's posted numbers and Artificial Analysis's independent measurements. Competitor scores are pulled from each lab's own announcement sheets and AA aggregates, not from a single head-to-head run.

Benchmark	Grok 4.3	Opus 4.7	GPT-5.5	Sonnet 4.6
AA Intelligence Index	53	59	61	51
SWE-Bench Verified	73.0%	87.6%	88.7%	78.2%
τ²-Bench Telecom	98%	96%	97%	92%
IFBench	81%	85%	87%	79%
GDPval-AA (ELO)	1500	1542	1568	1421
Output speed (tok/s)	~207	~98	~140	~155

Reading this table honestly. Grok 4.3 is not the frontier on SWE-Bench Verified. It loses to Opus 4.7 by 14.6 points and to GPT-5.5 by 15.7 points. On the agentic and instruction-following boards (τ²-Bench, IFBench) it is competitive. On the broad-intelligence aggregate (AA Intelligence Index) it sits at 53, below the GPT-5.5 / Opus 4.7 tier and above Sonnet 4.6. The Colossus 2 retrain added 321 ELO points on GDPval-AA over the old Grok 4.20 baseline, which is a meaningful agentic improvement, but it does not close the SWE-Bench gap.

The intelligence-per-dollar story still works because the denominator is so small. Grok at AA 53 and $1.25 input is roughly 0.42 cents per intelligence point. Opus at AA 59 and $5 is roughly 8.5 cents. GPT-5.5 at AA 61 and $5 (under 272K) is roughly 8.2 cents. Grok is buying 87 to 90 percent of the frontier quality for 15 cents on the dollar.

What this means in practice

Three routing reads from the same table.

If the workload is high-volume agentic traffic, structured tool-calling, or long-context analysis where the marginal benchmark points do not justify the marginal cost, Grok 4.3 is the cheapest published frontier option in May 2026. The Agent Mode unlock means you do not need to layer LangGraph or your own orchestrator on top to get planning behavior.

If the workload is code review, autonomous coding agents, SWE-Bench-shaped tasks where the last 15 points of accuracy is what you are paying for, stay on GPT-5.5 or Opus 4.7. The math still does not favor Grok at the very top of the curve. The DeepSeek V4-Pro promo is the only thing cheaper, and only until May 31.

If you are already running a router, Grok 4.3 now slots in between Sonnet 4.6 ($3.15 per 1M/10K) and the budget tier (Haiku 4.5, Gemini 3.1 Flash Lite, GPT-5.4 Mini, all under $1). It is cheaper than Sonnet, faster on output, and has Agent Mode built in. Sonnet still has the better SWE-Bench number and the better Anthropic ecosystem story. Pick on workload shape, not on tier.

For the full price grid and to model this against your actual workload, the pricing page has every frontier rate card side-by-side, and the calculator runs the math for any input/output mix. For the previous Grok 4.3 price-cut story from April 30, see the Grok 4.3 launch piece. For the DeepSeek V4-Pro promo deadline, see the May 31 cliff piece.

Sources

xAI: News - May 18, 2026 Colossus 2 retrain and Agent Mode unlock
xAI: Models documentation - Canonical model IDs and grok-4.3 pricing
Artificial Analysis: Grok 4.3 measurements - Intelligence Index 53, τ²-Bench 98, IFBench 81, GDPval-AA 1500
SemiAnalysis: Colossus 2 infrastructure - 1GW Blackwell-only cluster training the new weights
Basenor: Grok May 18 upgrades - Colossus 2 retrain plus Agent Mode breakdown
VentureBeat: Grok 4.3 April 30 launch - Original $1.25/$2.50 pricing context
OpenAI: API pricing - GPT-5.5 $5/$30 under 272K, 2x input / 1.5x output above
Anthropic: Claude pricing - Opus 4.7 $5/$25, Sonnet 4.6 $3/$15
Google: Gemini API pricing - Gemini 3.1 Pro $2/$12 under 200K, $4/$18 above
DeepSeek: API pricing - V4-Pro promo $0.435/$0.87 through May 31

Compare all model prices Calculate your API cost