Skip to main content
TokenCost logoTokenCost
Model ReleaseApril 23, 2026·7 min read

Kimi K2.6: the fastest model in the top 5, at the lowest price

Released April 20 by Moonshot AI, K2.6 scores 54 on the Artificial Analysis Intelligence Index - 3 points behind the frontier trio, but 134 tokens per second and $1.71 per million tokens blended. That combination does not exist anywhere else in the top 5 right now.

Earth at night from space, city lights forming glowing network clusters against deep black

Photo by NASA on Unsplash

  • $0.95/1M input, $4.00/1M output at Moonshot direct - cheapest in the top 5 globally
  • AA Intelligence Index score: 54, 3 points behind the frontier trio all tied at 57
  • 134 tokens/second output - faster than anything ranked above it
  • Context window is 256K. Every model above it has 1M+

K2.6 vs K2.5: a bigger jump than the version number implies

Kimi K2.5 scored 47 on the AA Intelligence Index. K2.6 scores 54 - that is a 7-point jump in a single generation, enough to go from a budget-tier option to the #4 model globally. For reference, there are only 4 points separating K2.6 from Claude Opus 4.7 at the very top.

The practical improvements are concentrated in agentic work. K2.5 could spawn 100 sub-agents with up to 1,500 coordinated tool calls. K2.6 scales that to 300 sub-agents and 4,000 steps. One team at Moonshot ran it autonomously for 5 days handling infrastructure monitoring and incident response without human oversight.

Coding benchmarks improved meaningfully too. SWE-Bench Verified went from 76.8% to 80.2%. That puts K2.6 within rounding error of Claude Opus 4.7 (80.8%) and Gemini 3.1 Pro (80.6%) on that specific test - models that cost 3x to 6x more per token.

API pricing across all providers

K2.6 is available on eight providers. Moonshot direct and Fireworks have the fastest inference; Parasail is cheapest at $0.60/$2.80; OpenRouter routes to DeepInfra at fp4 quantization if you use their aggregated endpoint.

ProviderInput / 1MCached / 1MOutput / 1MSpeed
Moonshot (direct)$0.95$0.16$4.00134 t/s
Fireworks AI$0.95$0.16$4.0085 t/s
Cloudflare Workers AI$0.95$0.16$4.0064 t/s
OpenRouter (DeepInfra fp4)$0.75$0.15$3.50varies
Parasail$0.60-$2.8017 t/s

Sources: platform.kimi.ai · artificialanalysis.ai · tokencost.app/pricing

What '4th globally' actually costs vs the top 3

The Artificial Analysis blended metric uses a 3:1 input/output weighting (3 input tokens per 1 output token). At that ratio, K2.6 on Moonshot direct comes out to $1.71/1M. Here is how that sits against the models ranked above it:

ModelAA scoreBlended / 1MSpeed
Claude Opus 4.757$10.0042 t/s
Gemini 3.1 Pro Preview57$4.50129 t/s
GPT-5.457$5.6376 t/s
Kimi K2.654$1.71134 t/s

Three points on the intelligence index separates K2.6 from the top. The cheapest model ranked above it is Gemini 3.1 Pro at $4.50/1M blended - 2.6x what K2.6 costs. Claude Opus 4.7 runs at $10/1M, nearly 6 times the price. Whether that gap is worth paying depends entirely on your use case.

Where K2.6 performs and where it does not

The intelligence score gap closes considerably on coding-specific tasks. On SWE-Bench Verified, K2.6 (80.2%) essentially ties Gemini 3.1 Pro (80.6%) and Claude Opus 4.7 (80.8%). On LiveCodeBench v6, it scores 89.6% - above Claude Opus 4.7 at 88.8%.

BenchmarkKimi K2.6Gemini 3.1 ProClaude Opus 4.7
SWE-Bench Verified80.2%80.6%80.8%
LiveCodeBench v689.6%91.7%88.8%
SWE-Bench Pro58.6%54.2%53.4%
HLE-Full (no tools)34.7%44.4%40.0%
AIME 202696.4%98.3%96.7%

The one area where K2.6 falls back is raw knowledge QA without tools. HLE-Full (no tools) is 34.7% - noticeably behind both Gemini and Claude. This matters if you are doing complex research tasks without tool use. For anything agentic with tools enabled, that gap mostly disappears.

The cost math at real usage volumes

Using a 50/50 input/output split (reasonable for agentic coding pipelines), here is what different monthly token volumes cost across the top models:

Monthly volumeKimi K2.6Gemini 3.1 ProGPT-5.4Claude Opus 4.7
10M tokens$25$70$88$150
100M tokens$248$700$875$1,500
1B tokens$2,475$7,000$8,750$15,000

50/50 input/output split, Moonshot direct pricing ($0.95 input / $4.00 output). Enable prompt caching and cached input drops from $0.95 to $0.16/1M - the savings compound quickly on repeated system prompts.

At 1B monthly tokens, the gap between K2.6 and Claude Opus 4.7 is $12,525/month - for a model that is 3 points behind on the intelligence index and faster at generating output. That math is hard to ignore unless you have a specific use case where those 3 points genuinely matter for your application.

The 256K context window is the real tradeoff

Kimi K2.6 has a 256K token context window (262,144 exactly). The frontier trio all sit at 1M tokens or more. That is not a small difference for large codebase work. If you are building an agent that needs to hold an entire monorepo in context across multiple files, the 256K limit will hit you.

Under the hood it is a 1-trillion total parameter MoE model with 32 billion active parameters per forward pass. That architecture - large total capacity, small active slice - is why inference is fast and why it can be self-hosted at all. 1T parameters sounds impractical; 32B active at runtime is not.

Modified MIT License means the weights are on Hugging Face and you can run it yourself. The model ID is moonshotai/Kimi-K2.6. Both vLLM and SGLang support it. For teams that cannot route code to external APIs, this is one of the few options in the top 5 globally.

How to access it

The Moonshot API at platform.kimi.ai uses an OpenAI-compatible endpoint. The model ID is kimi-k2.6. If you are already calling GPT-5.4 or Claude via an OpenAI-compatible wrapper, switching to test K2.6 is a model ID change.

Fireworks AI also has it at the same price with lower TTFT (0.73s vs 1.04s on Moonshot). If latency matters more than raw throughput, Fireworks is worth testing. For the absolute cheapest option, Parasail is $0.60/$2.80 but at 17 tokens/second - usable for batch jobs, not real-time.

The Kimi Code CLI at kimi.com/code is the consumer-facing coding agent built on K2.6. Worth testing as a reference implementation if you are building something similar.

Where this fits

Three points on the intelligence index is real. If you are doing work where those points show up as measurable errors, pay for a 57-scorer. Gemini 3.1 Pro at $4.50/1M blended is probably the right call there - frontier quality without the Claude Opus 4.7 premium.

But for coding pipelines, agentic workflows with tools, or anything where you need speed and volume more than pure knowledge breadth, K2.6 at $1.71/1M blended is hard to argue against. The SWE-Bench numbers are essentially identical to the top models, it generates output faster than anything above it in the rankings, and if 256K context covers your workload, the premium for a 57-scorer is hard to justify on coding benchmarks alone.

Sources