Model ReleaseMay 17, 2026·11 min read

Qwen3.6 Max Preview is the first Qwen flagship Alibaba shipped closed. It took SWE-Bench Pro, lost SWE-Bench Verified by omission, and costs 4x what Qwen3.6 Plus does.

Alibaba launched Qwen3.6 Max Preview on April 20 at $1.30 input and $7.80 output per million. Four weeks in, three things stand out. The model unseated GLM-5.1 on SWE-Bench Pro and Claude Opus 4.7 on Terminal-Bench 2. The scoresheet has no SWE-Bench Verified row, on a benchmark every competitor leads with. And the cheaper Qwen3.6 Plus, which still ships open weights and a 1M context, holds its own on everything except the agentic coding boards. The routing math is not what the headline suggests.

Qwen3.6 Max Preview launch banner from Alibaba Cloud announcement page

Image source: Alibaba Cloud / Qwen Team

Two rate cards for the same model, plus a cache tier nobody surfaces

Alibaba Cloud's Model Studio surfaces Max Preview at $1.30 input and $7.80 output per million tokens. OpenRouter is currently running a 20% off promo, which puts the effective rate at $1.04 input and $6.24 output. Artificial Analysis lists a cached input tier near $0.13 per million, a 90% discount that matches the rest of the Chinese frontier coding pricing structure. Alibaba does not surface that number directly on the model dashboard, which is why most coverage misses it.

Endpoint	Input / 1M	Output / 1M	Cached input	Context
Alibaba DashScope (list)	$1.30	$7.80	~$0.13	256K
OpenRouter (20% off)	$1.04	$6.24	n/p	262K
Max output ceiling	-	-	-	32,768 out

The 32K output ceiling is the one to watch. Long-form deep-research agents that stream 60K+ token reports will hit it. The cheaper Qwen3.6 Plus has a 1M context and a higher output ceiling, which matters when you size autonomous coding loops that emit large diffs across many turns. Max Preview is built for a different workload shape than what the parameter count alone might suggest.

Six benchmarks shipped, one benchmark missing

Official Qwen3.6 Max Preview benchmark comparison chart

The Alibaba Cloud launch post leads with four agentic coding boards (SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench) and four reasoning ones (GPQA Diamond, AIME 2025, SciCode, QwenWebBench). The omission is SWE-Bench Verified. Every western frontier model (Opus 4.7 at 87.6, GPT-5.5 at an unpublished figure but estimated near 85, Kimi K2.6 at 80.2) leads with it.

Benchmark	Max Preview	Kimi K2.6	Opus 4.7	Qwen3.6 Plus
SWE-Bench Verified	not published	80.2	87.6	78.8
SWE-Bench Pro	#1 (unseated GLM-5.1)	58.6	64.3	n/p
Terminal-Bench 2.0	69.4	66.7	62.1*	n/p
GPQA Diamond	86.0	n/p	n/p	83.4
AIME 2025	93.0	n/p	n/p	91.2
AA Intelligence Index	52	54	63	48
Output tok/s	37.7	54	72	91

*Opus 4.7 Terminal-Bench 2 score was the prior #1 before Max Preview took it. The chart Alibaba shipped lists Max Preview at 69.4 and Opus at 62.1, both running the same harness.

About that blank SWE-Bench Verified row. Third-party runs have surfaced numbers from 73 to 85 depending on the harness. The official Qwen sheet contains neither. The cleanest read is that Verified did not produce a chart-leading number, and Alibaba kept the chart pointed at boards where it did. Qwen3.6 Plus (the open sibling) published 78.8 on Verified, so the Qwen team can run the benchmark and knows how to score it. The omission is a choice.

The closed-weights move is the part nobody is calling out

Every Qwen flagship since the original Qwen-Max in 2024 has shipped with weights. Qwen2.5-Max put weights on Hugging Face the same day as the API endpoint went live. Qwen3 Max, Qwen3.5 Max, and Qwen3.6 Plus all followed that pattern. Qwen3 Coder Next sits at $0.11 input under Apache 2.0 right now. The open-weights posture is a Qwen brand identity, not an accident.

Qwen3.6 Max Preview broke that. No Hugging Face listing. No model card. No parameter file. The OpenRouter description names the architecture as a mixture-of-experts in the ~1T total parameter range, but the active-parameter count is not disclosed. Self-host is not an option. The only path is the API, and Alibaba controls the only meter on it.

For routing decisions this matters. The cheap-tier playbook for a year has been to use an open-weights frontier model as a hedge against API price hikes from the model vendor. Max Preview is the first time the cheapest hedge slot in this tier (Qwen) does not give you that option. If the SWE-Bench Pro top spot is valuable enough that you would route to Max Preview, you are now exposed to whatever Alibaba decides the post-promo price should be. The Plus sibling still ships open under Apache 2.0 and runs about half a quality tier lower.

Four workload shapes, five rate cards

Four workload shapes against the models you would otherwise route an agentic coding loop to, including a short RAG answer that catches the per-call latency tier. DeepSeek V4 Pro is shown at promo pricing because the promo runs through May 31. Max Preview uses Alibaba list rates; OpenRouter is 20% lower while the launch promo holds.

Workload	Max Preview	Qwen3.6 Plus	Kimi K2.6	DeepSeek V4 Pro	Opus 4.7
Agentic turn (50K in / 10K out)	$0.143	$0.030	$0.088	$0.030	$0.500
Big refactor (200K in / 30K out)	$0.494	$0.105	$0.310	$0.113	$1.750
RAG answer (10K in / 800 out)	$0.019	$0.004	$0.013	$0.005	$0.070
1B tokens / month (70 / 30 blend)	$3,250	$689	$1,865	$566	$11,000

Three things to notice. Plus runs the same workloads for roughly 21 cents on the Max-Preview dollar at the billion-token tier, despite trailing Plus on AIME and GPQA by only small margins. DeepSeek V4 Pro at promo pricing is the cheapest credible option on the board, but the promo ends in two weeks and the post-promo number runs about $2,260 per billion tokens, which lands above Plus and below Max Preview. The gap to Claude Opus 4.7 sits at roughly a third of Opus's bill at every tier, which is the slot the closed Chinese frontier has occupied for most of 2026.

The cached input tier changes the agentic-turn math. At $0.13 per million on cache reads, a session with a stable 5K-token system prompt and a 30K-token tool schema runs effectively free on input after the first turn. Output stays at $7.80, which is still the dominant bill driver for any agent that emits long diffs. The cache helps but does not flip the verdict against Qwen3.6 Plus, which also offers cache reads at a similar discount and starts from a lower fresh-input base.

Where Max Preview earns the routing slot

One paragraph per workload. Verdicts at the end of each.

SWE-Bench Pro-shaped work. This is the benchmark Max Preview actually leads. If your evaluation harness scores on SWE-Bench Pro (real-world repo issues that require multi-file edits and test-driven iteration), and Verified is not the contractual KPI, Max Preview is the cheapest model that holds the top slot today. The closed frontier charges roughly four times more per token for a benchmark gap measured in single digits. Route here.

Terminal-driven autonomous agents. Terminal-Bench 2.0 at 69.4 is the new SOTA. Shell session ownership, environment setup, long-horizon command sequences, container manipulation, all map to this score. The next best open-weights option is Kimi K2.6 at 66.7 (close but slower) and the closed frontier comparison is Opus 4.7 at 62.1. Route to Max Preview if Terminal-Bench is the workload shape.

SWE-Bench Verified work. Without a published number, Max Preview is hard to size against Kimi K2.6 (80.2), Opus 4.7 (87.6), and even Qwen3.6 Plus (78.8). For contracts where Verified is the bar, route to Qwen3.6 Plus first (open, a quarter of the input cost, 1M context), Kimi K2.6 if you need more headroom, Opus 4.7 if quality has to top everyone. Max Preview should not be the default until the number lands.

Math and formal reasoning. AIME 2025 at 93 and GPQA Diamond at 86 are respectable but trail Gemini 3.1 Pro on GPQA (94.3) and run roughly level with several other models on AIME. The price premium does not justify Max Preview here. Route to Gemini 3.1 Pro or pay the closed-frontier premium.

Long-context loads past 256K. OpenRouter caps Max Preview at 262K, Alibaba at 256K. Both sit well under the 1M ceilings on Qwen3.6 Plus, Gemini 3.1 Pro, GPT-5.5, and Claude Opus 4.7. Whole-repo loads or long retrieval contexts route elsewhere. The closest cheap alternative is Qwen3.6 Plus at 1M context.

Latency-sensitive agentic loops. Max Preview runs at 37.7 tokens per second with a 3.4-second time to first token. That puts it in the bottom third of Artificial Analysis's speed board. For an agent that takes 8-12 turns per task and emits a few thousand tokens each turn, the wall-clock cost adds up fast. Kimi K2.6 at 54 tok/s and Qwen3.6 Plus at 91 tok/s both feel snappier in the same loop. Route to Plus or Kimi if user-visible latency is the constraint.

Workloads where you need open weights. Max Preview is API-only and Alibaba-hosted. If your stack includes a self-hosted requirement (regulated data, on-prem deployment, air-gapped pipeline, or just a hedge against price shifts), Max Preview is not on the menu. Qwen3.6 Plus is the closest sibling with weights you can run yourself.

The bet Alibaba is placing

Qwen3.6 Max Preview sits in an awkward slot. Pricing puts it above the workloads its own Plus sibling handles competently, and not quite cheap enough to displace Kimi K2.6 or DeepSeek V4 Pro at the agentic coding price tier. Benchmark wins are real but narrow, and the SWE-Bench Verified omission is a flag for any contract that names that score as the KPI.

The bigger story is the closed-weights move. Alibaba is testing whether Qwen brand equity holds up without the open-source posture. If Max Preview gains commercial traction at $1.30 input, the next flagship Qwen probably ships closed too. If it does not, the open-weights sibling cycle continues. Either way, this preview is a useful data point about which side of that bet Alibaba is on.

We track all of these on the TokenCost pricing page and the per-task math is in the calculator. For the cheaper open Qwen sibling, see the Qwen3.6 Plus breakdown.

Sources

Alibaba Cloud: Qwen3.6 Max Preview launch - Release April 20 2026, benchmark sheet, official chart
Qwen blog: Qwen3.6-Max-Preview - Official mirror of the launch post
OpenRouter: qwen/qwen3.6-max-preview - $1.04 input, $6.24 output (20% off promo), 262K context
Artificial Analysis: Qwen3.6 Max - $1.30/$7.80 list, AA Intelligence Index 52, 37.7 tok/s
Artificial Analysis: Plus vs Max comparison - Side-by-side benchmark sheet
OpenRouter: Kimi K2.6 - Comparison pricing, SWE-Bench Verified 80.2
DeepSeek: API pricing - V4 Pro promo $0.435/$0.87 through May 31
Anthropic: Claude pricing - Opus 4.7 $5/$25 per 1M

Compare all model prices Calculate your API cost