Cursor Composer 2.5 lands one point behind Opus 4.7 on SWE-Bench and bills ten times less. The promo just ended.
On May 18 Cursor shipped its own coding model and gave everyone a week of double usage to stress-test it. That promo ended around May 25, so the rates you are paying right now are the real ones: $0.50 input and $2.50 output per million tokens on Standard, $3.00 and $15.00 on Fast. Artificial Analysis put it third on the Coding Agent Index, sitting between GPT-5.5 and a clean Pareto frontier. The interesting question is no longer whether the model works. It does. The question is whether a hosted-only coding agent with no public API is the right place to send your team's real work.

Photo by Rob Wingate on Unsplash
Ten times cheaper, one point behind, hosted only
Three things you need before reading any further. Composer 2.5 Standard at $0.50 input and $2.50 output is sub-frontier pricing, roughly 10x under Opus 4.7 and GPT-5.5 on output. On Artificial Analysis's Coding Agent Index it lands at 62, one point below GPT-5.5 and four below Opus 4.7. And the catch worth saying twice: the model lives inside Cursor and only inside Cursor. There is no REST endpoint for it. If your stack is not Cursor, this is a story you are reading, not a model you can buy.
The launch-week 2x usage promo expired around May 25. Bills coming in this week are the first ones at full rates. Worth a glance at the invoice before you assume nothing changed.
Standard, Fast, and what they actually mean
Cursor publishes Composer 2.5 in two flavors. Both are the same underlying model. The difference is the latency budget Cursor is willing to throw at your request and, on the rate card, how much they charge you for it.
| Tier | Input / 1M | Output / 1M | Avg minutes/task | Where it lives |
|---|---|---|---|---|
| Composer 2.5 Standard | $0.50 | $2.50 | 9.3 | Cursor only |
| Composer 2.5 Fast | $3.00 | $15.00 | 6.7 | Cursor only |
| Kimi K2.5 (base model) | $0.60 | $3.00 | — | Moonshot API |
Standard is what you should default to. It is the cheap one, and unless your team is sitting waiting for the model to think, the 2.6 extra minutes per task is invisible because you are working on something else in another tab. Fast exists for the people who want a generation in front of them before they can context-switch, which is a real preference but a 6x markup on output to satisfy it.
Note Cursor undercut the upstream model on Standard, which is unusual. The Kimi K2.5 base lists at $0.60 input and $3.00 output on Moonshot's own API. Cursor charges less than the underlying inference would cost them by quite a margin, which means Standard is being subsidized to drive adoption. That is fine while it lasts. It tells you something about where the rates could land six months in.
How it stacks against the models you would have used instead
Compare like for like. Below is what the frontier and second-tier coding models cost per million tokens, pulled from each provider's current pricing page on May 27, 2026.
| Model | Input / 1M | Output / 1M | AA Coding Index |
|---|---|---|---|
| Composer 2.5 Standard | $0.50 | $2.50 | 62 (#3) |
| Claude Opus 4.7 | $5.00 | $25.00 | 66 (#1) |
| GPT-5.5 | $5.00 | $30.00 | 65 (#2) |
| Gemini 3.1 Pro | $2.00 | $12.00 | ~58 |
| Grok Build 0.1 | $1.00 | $2.00 | not scored |
| DeepSeek V4-Pro | $0.435 | $0.87 | ~54 |
Composer 2.5 Standard sits in a strange spot. On output, the number that drives every coding-agent bill, $2.50 is a tenth of Opus 4.7 and a twelfth of GPT-5.5. Yet the benchmark gap is two and four points. Grok Build and DeepSeek V4-Pro are cheaper still and have either no benchmark or a clearly lower one. Composer 2.5 is the first model to sit on the Pareto frontier between Anthropic-tier output quality and DeepSeek-tier output pricing, and that is the entire pitch.
What a real coding session costs on each
Two scenarios. The first is a typical session: agent reads 500K tokens of context, writes 50K of output, hits the model maybe ten times. The second is an agent-heavy session with 5M input across a long-running task, of which 80 percent are cache hits because the file tree and system prompt repeat, plus 200K output. The cost math tells you where the gap actually is.
| Model | Typical (500K in / 50K out) | Agent-heavy (5M in, 80% cached / 200K out) |
|---|---|---|
| Composer 2.5 Standard | $0.38 | $3.00 |
| Composer 2.5 Fast | $2.25 | $18.00 |
| Claude Opus 4.7 | $3.75 | $12.00 |
| GPT-5.5 | $4.00 | $13.00 |
Typical session: 38 cents on Composer 2.5 Standard, $3.75 on Opus 4.7. That is a 10x ratio. Run a hundred of those a month and it is $38 versus $375. Run a thousand and it is $380 versus $3,750. The absolute numbers are small but the multiplier is real, and at the scale teams actually do this, the bill moves.
Agent-heavy session: the gap closes. Anthropic's 10x cache discount cuts Opus 4.7's cached-read cost to $0.50 per million, and at 4M cached tokens that brings Opus down to $12. Composer 2.5 does not publish a cache discount, so its 5M input bills at full $0.50, putting it at $3. Cheaper still, but the multiplier dropped from 10x to 4x. The longer your sessions and the heavier the cache reuse, the less Composer 2.5 wins by. The shorter and more output-bound, the more it dominates. Worth modeling against your own traffic before assuming the 10x carries over.
The benchmarks, not the marketing version
Cursor published three benchmark numbers in the launch post and one external party, Artificial Analysis, ran their own harness against it. The picture is consistent across all four.
| Benchmark | Composer 2.5 | Opus 4.7 | GPT-5.5 |
|---|---|---|---|
| SWE-Bench Multilingual | 79.8% | 80.5% | not run |
| Terminal-Bench 2.0 | 69.3% | 69.4% | 82.7% |
| CursorBench v3.1 | 63.2% | 64.8% (adaptive) | not run |
| AA Coding Agent Index | 62 | 66 | 65 |
On the two evals where both Cursor and Anthropic ran their own numbers, the gap is under a point. On CursorBench, which is Cursor's house benchmark and therefore deserves a small grain of salt, it is 1.6 points behind Opus 4.7's adaptive setup. Artificial Analysis's independent harness puts the gap at four points. Pick your reference, the answer is that Composer 2.5 is in the same tier as the frontier coding models on accuracy. There is one weak spot: Terminal-Bench specifically, where GPT-5.5 with high reasoning effort is 13 points ahead of everyone. If your agent spends most of its time poking at a real shell, GPT-5.5 still has a real advantage there.
See Artificial Analysis's own Coding Agent Index breakdown for the full per-task cost figures: Composer 2.5 Standard came in at 7 cents per task, Opus 4.7 at $4.10, GPT-5.5 at $4.82.
About the base model: it is Moonshot's Kimi
Cursor confirmed in March, after researchers fingerprinted the model from Cursor's API traffic, that Composer is built on Moonshot's Kimi K2 family. Composer 2.5 specifically is based on the Kimi K2.5 open-weight checkpoint, a 1 trillion parameter mixture-of-experts with 32 billion parameters active per token. The base license is Modified MIT, which is what made the lift possible.
Cursor's own claim is that roughly a quarter of the final-model compute came from K2.5, with the rest being their own RL on top. Inference is hosted on Fireworks AI. The functional consequence: the 256K context window is inherited from K2.5, which is fine for most sessions but small compared to Opus 4.7 and GPT-5.5 at 1M and 272K respectively. For a typical coding agent that fits.
None of this matters if you only ever use Composer 2.5 inside Cursor, which is the only way to use it. It matters a lot if you wanted to run the same model directly. You cannot run Composer 2.5 directly, but you can run Kimi K2.5 on Moonshot's API at roughly comparable prices, minus the Cursor post-training that seems to actually move the benchmarks.
Where this fits if you are not already on Cursor
Composer 2.5 is a strong reason to be on Cursor and not a reason to switch off your existing API stack, because you literally cannot switch to it without switching to Cursor first. For teams already paying Cursor Pro ($20) or Ultra ($200), this is mostly upside: a faster, cheaper default model that draws from a separate usage pool and stretches your subscription further. For teams running their own agent infra on raw provider APIs, the equivalent move is picking up Kimi K2.5 directly at $0.60 input and $3 output, which gives you most of the price advantage and lets you keep your stack.
The pattern from this launch is the bigger story. IDE companies that started as wrappers around frontier APIs are now training their own coding heads and undercutting the providers underneath. Cursor has done it. Replit has hinted at it. Windsurf will follow. The margin in being a thin client over Anthropic and OpenAI is gone, and the next phase is everyone owning a piece of inference. Which is bad news for nobody except the wrapper margin.
Who should care, and what to do this week
If you are on Cursor and have not switched the default model: try Standard for a day. The 6.7 minute Fast tier is the marketing highlight, but Standard is the one that delivers the actual price advantage, and the latency difference disappears as soon as you switch tabs. If your team is on Claude Code or Codex CLI and considering Cursor for cost reasons, the math says Cursor is cheaper if your Cursor seat is already paid for. If you would have to add a new seat, do the seat math first; a $20 Pro plan plus Composer 2.5 usage often beats the equivalent API spend, but only once you are actually running a few hundred tasks a month.
Model the comparison against your own input-to-output ratio with the calculator, or view every coding model side by side on the pricing page. For context on how Cursor got here, see the earlier piece on Kimi K2.5 versus GPT-5.4 on coding cost, and on the alternative agents the Claude Code vs Codex cost breakdown.
Sources
- Cursor: Introducing Composer 2.5 - May 18, 2026 launch post, Standard/Fast pricing, internal benchmarks
- Cursor: Composer 2.5 model docs - usage pool behavior on Pro/Ultra, plan packaging
- Artificial Analysis: Composer 2.5 on the Coding Agent Index - independent benchmark + per-task cost numbers
- TechCrunch: Cursor confirms Kimi base - March 2026 confirmation that Composer is built on Moonshot's Kimi
- Moonshot: Kimi K2.5 API pricing - $0.60 input, $3.00 output, $0.10 cache-hit
- Anthropic: Claude pricing - Opus 4.7 $5/$25, 10x cache discount
- OpenAI: API pricing - GPT-5.5 $5/$30
- Cursor: Subscription pricing - Pro $20, Ultra $200, usage pool structure