Skip to main content
TokenCost logoTokenCost
ComparisonJune 17, 2026·6 min read

GLM 5.2 and Kimi K2.7 Code shipped five days apart. You still cannot benchmark one against the other.

Two Chinese labs put out open-weight coding models in the same week. Both are built to drive agents, both undercut GPT-5.5 on price, and both want to be the model you point Claude Code at. Kimi K2.7 Code is the cheaper one. GLM 5.2 gives you four times the context. The thing neither will give you is a number you can check.

Abstract blue glowing dots forming wave patterns on a dark background

Photo by jonakoh on Unsplash

Moonshot shipped Kimi K2.7 Code on June 12. Z.ai shipped GLM 5.2 on June 13, then quietly filled in its per-token price and open weights around June 16. For a few days it looked like only one of these had a rate card. Now both do, which makes the comparison everyone wanted finally possible on the one axis it was ever going to be possible on: cost.

We covered each on its own when it landed: the Kimi K2.7 Code write-up and the GLM 5.2 launch piece, the latter written before Z.ai posted a token price at all. This is the head-to-head. It is shorter than you might expect, because the two models agree on most of what matters and disagree on exactly two things you can put a number to.

The rate card: same output, different input

Start with the prices, because they are the only fully verified facts in this whole comparison. Both quote in dollars per million tokens, both on their own first-party APIs.

ModelInput / 1MOutput / 1MCached input
Kimi K2.7 Code$0.95$4.00$0.19
GLM 5.2$1.40$4.40~$0.26

The output rates are close enough to call a tie, 40 cents apart per million. The real gap is on input, where GLM 5.2 charges 45 cents more, about 47% over Kimi. GLM 5.2 held the exact rate of GLM 5.1, so this is not a price hike on Z.ai's side so much as Moonshot simply pricing reads cheaper. One caveat on the cached column: Kimi's $0.19 cache-hit rate is documented, while GLM 5.2's ~$0.26 is reported around the launch but not something I could pin to a primary Z.ai page, so treat it as provisional.

Why a 45-cent input gap turns into real money

Coding agents are lopsided. They read constantly, re-loading files, diffs, and tool output on every step, and they write comparatively little. So the input rate is the one that compounds. Here is the same workload at two intensities: a heavy month where an agent runs all day, and a lighter one for someone using it in bursts.

Monthly usageKimi K2.7 CodeGLM 5.2You pay extra for GLM
10M in / 2M out$17.50$22.80+$5.30
50M in / 10M out$87.50$114.00+$26.50

So GLM 5.2 runs about 30% more on a hard-driven coding month. Not a chasm, and well inside the range where output quality could justify it, if you knew the output quality. That is the catch, and it is the next section. Want to run your own split? The cost calculator takes any input and output volume.

The benchmark that would settle this does not exist

Here is the part that should make you cautious about either model. To pick the pricier one on merit, you would want a coding benchmark both have run. There isn't one. GLM 5.2 eventually posted official scores, all run on Z.ai's own setup. Kimi K2.7 Code posted nothing on the public suites, only Moonshot's in-house benchmarks with names you have never seen on a leaderboard.

BenchmarkGLM 5.2Kimi K2.7 Code
SWE-Bench Pro62.1Not run
Terminal-Bench 2.181.0Not run
DeepSWE46.2Not run
Public coding suiteSelf-reportedNone published

GLM 5.2's numbers look strong, and the 62.1 on SWE-Bench Pro would edge out most things short of Claude Fable 5 if it holds up under an independent run. But "if it holds up" is doing a lot of work, because vendor scaffolds tend to flatter their own models and nobody has reproduced these yet. Kimi's practitioners have been blunter still: a VentureBeat piece rounded up developers saying the in-house gains don't obviously show up in real work.

The honest read: GLM 5.2 has shown its homework and Kimi has not, but both sets of homework are self-graded. If a benchmark matters to your decision, the only one that counts here is the one you run yourself on your own repo.

Where they genuinely differ: context and shape

The second number you can actually compare is context length, and the gap is wide. GLM 5.2 carries a 1M-token window; Kimi K2.7 Code stops at 256K. For most agent loops 256K is plenty, but if you do monorepo-scale work where the model needs to hold a large slice of the tree at once, that 4x headroom on GLM is the single strongest reason to pay its premium.

SpecGLM 5.2Kimi K2.7 Code
Context window1M tokens256K tokens
Parameters753B total (MoE)1T total, 32B active
WeightsOpen, MITOpen, Modified MIT
EndpointAnthropic-compatibleMoonshot API

Both are open weight, which is the quiet headline. GLM 5.2 under MIT and Kimi under Modified MIT both let you pull the model and run it on your own iron, so the list price is a ceiling, not a floor. That alone sets this pair apart from GPT-5.5 or Claude Opus 4.8, where the meter is the only door in. One note on the Kimi price specifically: its Moonshot-direct rate is $0.95/$4.00, but OpenRouter has listed it lower at $0.74/$3.50, so check which endpoint you are actually billing against before you quote yourself a number.

So which one

Default to Kimi K2.7 Code if you bill by the token and your work fits inside 256K. It is cheaper where coding agents spend the most, and the savings are real even if modest. Reach for GLM 5.2 when context is the constraint, when you want the 1M window for repo-scale jobs, or when you would rather lean on the model that at least published standard benchmark scores, generous as they may be.

And because both are open weight, the cleanest tiebreaker is free: pull each one, run the same three tasks from your own backlog, and watch which ships the better diff. That test costs you an afternoon and beats every vendor chart on this page.

Sources