Skip to main content
TokenCost logoTokenCost
ComparisonMarch 29, 2026·9 min read

Kimi K2.5 vs GPT-5.4: the model Cursor built on, and what it actually costs

Cursor launched Composer 2 without saying what was under the hood. A developer intercepted the model ID six hours later. It was a fine-tuned Kimi K2.5 from Moonshot AI -- a model that costs about 5x less than GPT-5.4 per token. Here is the full pricing breakdown and what the benchmarks actually show.

Code editor on dark screen showing AI coding assistant comparing Kimi K2.5 and GPT-5.4

Photo by Ferenc Almasi on Unsplash

Kimi K2.5 costs $0.60/1M input tokens and $3.00/1M output -- GPT-5.4 is $2.50 and $15.00, so input is about 76% cheaper and output runs at a fifth of the cost. Cursor launched Composer 2 on March 19 without naming the base model; a developer found the ID six hours later. On SWE-Bench Verified, Kimi K2.5 scores 76.8% -- a few points behind the frontier, but with cache hits at $0.10/1M, the price gap keeps widening while the benchmark gap stays put.

How the Kimi story became public

On March 19, 2026, Cursor announced Composer 2 as their best coding model yet. The post described it as the result of a "first continued pretraining run" and a proprietary RL training process. No model attribution.

About six hours later, a developer named @fynnso posted on X that they had modified the OpenAI base URL in Cursor's API calls and caught the raw model ID: kimi-k2p5-rl-0317-s515-fast. The post went viral. Elon Musk replied: "Yeah, it's Kimi 2.5."

Cursor team member Lee Robinson confirmed it the next day, saying Kimi K2.5 was the base they started from and that they were following the Modified MIT License through Fireworks AI as their inference partner. He described himself as "thankful for OSS models personally."

Kimi K2.5 uses a Modified MIT License. The attribution question -- whether Cursor was required to display the base model name given their revenue -- became a separate thread. Several Moonshot employees tagged Cursor publicly, then deleted those posts within hours. Legal involvement was reported. Whatever the resolution, the underlying question for developers is more practical: if Cursor is using a model that costs a fraction of GPT-5.4, should you be too?

What Kimi K2.5 actually is

Kimi K2.5 is from Moonshot AI, a Beijing-based company valued at $4.3 billion. The model was released January 27, 2026 under a Modified MIT License -- meaning the weights are publicly available on Hugging Face and you can run it yourself.

Kimi K2.5 builds on the Kimi K2 base, which uses a Mixture of Experts architecture with 1 trillion total parameters and 32 billion active during inference. That is what makes it cheap to run -- the reasoning capacity of a very large model at the compute cost of a much smaller one. Context window is 256K tokens (262,144 exactly).

It also has an agentic mode that can spawn up to 100 sub-agents with 1,500 total tool calls, running 4.5x faster than a single-agent setup on the same task. Cursor's use of it for a coding assistant is not surprising -- it was designed for exactly that kind of multi-step autonomous work.

API pricing comparison

Kimi K2.5 is not just slightly cheaper than GPT-5.4. It is in a different pricing bracket. Here is how the numbers compare:

ModelInput / 1MCached input / 1MOutput / 1MContext
Kimi K2.5$0.60$0.10$3.00256K
GPT-5.4$2.50$0.25$15.001.05M
Claude Sonnet 4.6$3.00$0.30$15.00200K

Sources: platform.kimi.ai · tokencost.app/pricing

What this costs at real usage volumes

Coding workloads generate a lot of output -- generated code, file diffs, explanations. A 50/50 input/output split is a reasonable estimate for a coding assistant. Here is what that looks like across different usage levels:

Monthly volumeKimi K2.5GPT-5.4Savings
10M tokens -- solo dev$18$87.50save $69.50 (80% less)
50M tokens -- agent loop$90$437.50save $347.50
100M tokens -- startup team$180$875about a fifth the cost
1B tokens -- enterprise pipeline$1,800$8,750save $6,950

Cache-miss input pricing ($0.60/1M), 50/50 input/output split. With prompt caching enabled, Kimi K2.5 input drops to $0.10/1M and the savings increase further.

How the coding benchmarks compare

One thing to flag: Moonshot's official benchmark table compares Kimi K2.5 against GPT-5.2, not GPT-5.4. There is no published direct comparison with GPT-5.4 yet. With that caveat clearly stated, here is what the numbers show:

BenchmarkKimi K2.5GPT-5.2 (xhigh)Claude Opus 4.5 ET
SWE-Bench Verified76.8%80.0%80.9%
SWE-Bench Multilingual73.0%72.0%77.5%
LiveCodeBench v685.0%--82.2%
Terminal-Bench 2.050.8%54.0%59.3%
HLE-Full w/ tools50.2%45.5%43.2%

Source: Moonshot AI official benchmark post. ET = Extended Thinking mode.

The pattern is consistent: Kimi K2.5 is slightly behind on English-language single-repo tasks (SWE-Bench Verified), ahead on multilingual code, and competitive on agentic tool use. It falls back on terminal-intensive tasks.

Cursor's RL fine-tuning pushed their internal CursorBench score from 44.2% (Composer 1.5) to 61.3% (Composer 2). That is what targeted training buys you. The vanilla Kimi K2.5 numbers are the floor for this architecture, not the ceiling.

When the price gap actually matters

The 5x price difference does not make this a simple swap. A few things to consider before switching a production coding pipeline:

Kimi K2.5 has a 256K context window. GPT-5.4 has 1.05 million tokens. If your coding assistant needs to hold large codebases in context -- multiple files, long conversation history, full repo contents -- GPT-5.4 has a meaningful advantage. 256K is enough for most tasks, but it will hit limits on large repos that 1M context handles comfortably.

Open weights change the compliance calculus entirely. You can run it on your own infrastructure -- no API calls at all if you have the hardware. At 32B active parameters, it runs on A100-class GPUs. Teams that cannot send code to external APIs have a real option here that GPT-5.4 simply is not.

For multilingual codebases, Kimi K2.5 is the more interesting choice on current benchmarks. Its 73.0% on SWE-Bench Multilingual edges GPT-5.2 at 72.0%. Cursor noticed this and specifically trained their RL layer on multilingual repos.

On English-language codebases, GPT-5.4 and Claude Opus 4.x are a few points ahead on SWE-Bench Verified. Whether that gap is worth 5x the cost depends on your error tolerance. A 3-point benchmark difference probably means different things at $18/month vs $1,800/month.

How to access Kimi K2.5

The API is available at platform.kimi.ai with an OpenAI-compatible format, which makes switching straightforward from existing GPT integrations. The model ID is kimi-k2.5.

The weights are on Hugging Face under moonshotai if you want to self-host. At 32B active parameters with MoE routing, it is considerably lighter than its 1T total parameter count suggests.

Cursor also lists it as a model in their UI at $0.50/1M input and $2.50/1M output -- a markup from the direct API price, which is normal. The full context management and IDE tooling costs something.

The short version

Kimi K2.5 is not better than GPT-5.4 on the benchmarks that matter most for English-language coding. On multilingual repos and agentic tool use, it competes. It costs about 5x less and runs on your own hardware if you need that. Cursor built a successful coding product on it with additional RL training, which shows the gap is closable. Whether the savings justify the tradeoffs depends on your volume, your compliance requirements, and how much the benchmark difference matters for your specific use case.

Sources