How much does Kimi K2.6 cost via API?

Kimi K2.6 costs $0.95 per million input tokens and $4.00 per million output tokens on the Moonshot direct API. Cache hits drop input to $0.16 per million. The cheapest provider is Parasail at $0.60 input / $2.80 output.

What is Kimi K2.6 ranked globally?

Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index as of April 2026, placing it 4th globally and first among open-weights models. The top-3 models (Claude Opus 4.7, Gemini 3.1 Pro Preview, GPT-5.4) all score 57.

Is Kimi K2.6 open source?

Yes. Kimi K2.6 is released under a Modified MIT License. The weights are publicly available on Hugging Face at huggingface.co/moonshotai/Kimi-K2.6. You can self-host it using vLLM or SGLang. The active parameter count is 32B, making it practical to run on A100-class hardware.

How does Kimi K2.6 compare to Kimi K2.5?

Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index versus 47 for K2.5 - a 7-point jump that moves it from a mid-tier model to the top 5 globally. SWE-Bench Verified improved from 76.8% to 80.2%. Agentic scale increased from 100 sub-agents to 300. Pricing is similar: $0.95/$4.00 for K2.6 vs $0.60/$3.00 for K2.5.

Model ReleaseApril 23, 2026·7 min read

Kimi K2.6: the fastest model in the top 5, at the lowest price

Released April 20 by Moonshot AI, K2.6 scores 54 on the Artificial Analysis Intelligence Index - 3 points behind the frontier trio, but 134 tokens per second and $1.71 per million tokens blended. That combination does not exist anywhere else in the top 5 right now.

Earth at night from space, city lights forming glowing network clusters against deep black

Photo by NASA on Unsplash

$0.95/1M input, $4.00/1M output at Moonshot direct - cheapest in the top 5 globally
AA Intelligence Index score: 54, 3 points behind the frontier trio all tied at 57
134 tokens/second output - faster than anything ranked above it
Context window is 256K. Every model above it has 1M+

K2.6 vs K2.5: a bigger jump than the version number implies

Kimi K2.5 scored 47 on the AA Intelligence Index. K2.6 scores 54 - that is a 7-point jump in a single generation, enough to go from a budget-tier option to the #4 model globally. For reference, there are only 4 points separating K2.6 from Claude Opus 4.7 at the very top.

The practical improvements are concentrated in agentic work. K2.5 could spawn 100 sub-agents with up to 1,500 coordinated tool calls. K2.6 scales that to 300 sub-agents and 4,000 steps. One team at Moonshot ran it autonomously for 5 days handling infrastructure monitoring and incident response without human oversight.

Coding benchmarks improved meaningfully too. SWE-Bench Verified went from 76.8% to 80.2%. That puts K2.6 within rounding error of Claude Opus 4.7 (80.8%) and Gemini 3.1 Pro (80.6%) on that specific test - models that cost 3x to 6x more per token.

API pricing across all providers

K2.6 is available on eight providers. Moonshot direct and Fireworks have the fastest inference; Parasail is cheapest at $0.60/$2.80; OpenRouter routes to DeepInfra at fp4 quantization if you use their aggregated endpoint.

Provider	Input / 1M	Cached / 1M	Output / 1M	Speed
Moonshot (direct)	$0.95	$0.16	$4.00	134 t/s
Fireworks AI	$0.95	$0.16	$4.00	85 t/s
Cloudflare Workers AI	$0.95	$0.16	$4.00	64 t/s
OpenRouter (DeepInfra fp4)	$0.75	$0.15	$3.50	varies
Parasail	$0.60	-	$2.80	17 t/s

Sources: platform.kimi.ai · artificialanalysis.ai · tokencost.app/pricing

What '4th globally' actually costs vs the top 3

The Artificial Analysis blended metric uses a 3:1 input/output weighting (3 input tokens per 1 output token). At that ratio, K2.6 on Moonshot direct comes out to $1.71/1M. Here is how that sits against the models ranked above it:

Model	AA score	Blended / 1M	Speed
Claude Opus 4.7	57	$10.00	42 t/s
Gemini 3.1 Pro Preview	57	$4.50	129 t/s
GPT-5.4	57	$5.63	76 t/s
Kimi K2.6	54	$1.71	134 t/s

Three points on the intelligence index separates K2.6 from the top. The cheapest model ranked above it is Gemini 3.1 Pro at $4.50/1M blended - 2.6x what K2.6 costs. Claude Opus 4.7 runs at $10/1M, nearly 6 times the price. Whether that gap is worth paying depends entirely on your use case.

Where K2.6 performs and where it does not

The intelligence score gap closes considerably on coding-specific tasks. On SWE-Bench Verified, K2.6 (80.2%) essentially ties Gemini 3.1 Pro (80.6%) and Claude Opus 4.7 (80.8%). On LiveCodeBench v6, it scores 89.6% - above Claude Opus 4.7 at 88.8%.

Benchmark	Kimi K2.6	Gemini 3.1 Pro	Claude Opus 4.7
SWE-Bench Verified	80.2%	80.6%	80.8%
LiveCodeBench v6	89.6%	91.7%	88.8%
SWE-Bench Pro	58.6%	54.2%	53.4%
HLE-Full (no tools)	34.7%	44.4%	40.0%
AIME 2026	96.4%	98.3%	96.7%

The one area where K2.6 falls back is raw knowledge QA without tools. HLE-Full (no tools) is 34.7% - noticeably behind both Gemini and Claude. This matters if you are doing complex research tasks without tool use. For anything agentic with tools enabled, that gap mostly disappears.

The cost math at real usage volumes

Using a 50/50 input/output split (reasonable for agentic coding pipelines), here is what different monthly token volumes cost across the top models:

Monthly volume	Kimi K2.6	Gemini 3.1 Pro	GPT-5.4	Claude Opus 4.7
10M tokens	$25	$70	$88	$150
100M tokens	$248	$700	$875	$1,500
1B tokens	$2,475	$7,000	$8,750	$15,000

50/50 input/output split, Moonshot direct pricing ($0.95 input / $4.00 output). Enable prompt caching and cached input drops from $0.95 to $0.16/1M - the savings compound quickly on repeated system prompts.

At 1B monthly tokens, the gap between K2.6 and Claude Opus 4.7 is $12,525/month - for a model that is 3 points behind on the intelligence index and faster at generating output. That math is hard to ignore unless you have a specific use case where those 3 points genuinely matter for your application.

The 256K context window is the real tradeoff

Kimi K2.6 has a 256K token context window (262,144 exactly). The frontier trio all sit at 1M tokens or more. That is not a small difference for large codebase work. If you are building an agent that needs to hold an entire monorepo in context across multiple files, the 256K limit will hit you.

Under the hood it is a 1-trillion total parameter MoE model with 32 billion active parameters per forward pass. That architecture - large total capacity, small active slice - is why inference is fast and why it can be self-hosted at all. 1T parameters sounds impractical; 32B active at runtime is not.

Modified MIT License means the weights are on Hugging Face and you can run it yourself. The model ID is moonshotai/Kimi-K2.6. Both vLLM and SGLang support it. For teams that cannot route code to external APIs, this is one of the few options in the top 5 globally.

How to access it

The Moonshot API at platform.kimi.ai uses an OpenAI-compatible endpoint. The model ID is kimi-k2.6. If you are already calling GPT-5.4 or Claude via an OpenAI-compatible wrapper, switching to test K2.6 is a model ID change.

Fireworks AI also has it at the same price with lower TTFT (0.73s vs 1.04s on Moonshot). If latency matters more than raw throughput, Fireworks is worth testing. For the absolute cheapest option, Parasail is $0.60/$2.80 but at 17 tokens/second - usable for batch jobs, not real-time.

The Kimi Code CLI at kimi.com/code is the consumer-facing coding agent built on K2.6. Worth testing as a reference implementation if you are building something similar.

Where this fits

Three points on the intelligence index is real. If you are doing work where those points show up as measurable errors, pay for a 57-scorer. Gemini 3.1 Pro at $4.50/1M blended is probably the right call there - frontier quality without the Claude Opus 4.7 premium.

But for coding pipelines, agentic workflows with tools, or anything where you need speed and volume more than pure knowledge breadth, K2.6 at $1.71/1M blended is hard to argue against. The SWE-Bench numbers are essentially identical to the top models, it generates output faster than anything above it in the rankings, and if 256K context covers your workload, the premium for a 57-scorer is hard to justify on coding benchmarks alone.

Sources

Kimi K2.6 API pricing - platform.kimi.ai
Kimi K2.6 official benchmarks - Moonshot AI blog, April 20, 2026
Kimi K2.6 model card - Hugging Face (moonshotai/Kimi-K2.6)
Kimi K2.6 intelligence index and provider benchmarks - Artificial Analysis
OpenRouter model listing - openrouter.ai

Compare all model prices Calculate your API cost