Qwen3.6-Plus: $0.28 per million input tokens, and the benchmark comparison Alibaba chose not to lead with
Released April 2, 2026. At $0.276 per million input tokens globally, it costs 18x less than Claude Opus 4.6. On multimodal tasks it genuinely outperforms. On agentic coding, the current Claude Opus 4.6 still wins - which is why Alibaba's benchmark chart compared against the older 4.5.

Image source: Qwen Blog
- Released April 2. $0.276/M input globally (Global tier, under 256K). 18x less than Claude Opus 4.6.
- Genuinely better than Claude 4.5 Opus on documents, images, and video. Claude 4.6 Opus still wins on agentic coding.
- Alibaba's benchmark chart used Claude 4.5, not 4.6. OpenRouter collects your prompt data.
Pricing by region and context band
The API pricing depends on where you call from and how much context you use per request. Alibaba bills differently for requests under 256K tokens versus longer ones.
| Region | Context band | Input / 1M | Output / 1M |
|---|---|---|---|
| Global (US/EU) | 0 - 256K tokens | $0.276 | $1.651 |
| Global (US/EU) | 256K - 1M tokens | $1.101 | $6.602 |
| International (Singapore) | 0 - 256K tokens | $0.50 | $3.00 |
| International (Singapore) | 256K - 1M tokens | $2.00 | $6.00 |
For context: Claude Opus 4.6 costs $5.00/M input and $25.00/M output with no context band tiering. GPT-5.4 runs $2.50/M input and $15.00/M output. Qwen3.6-Plus on the Global tier, for requests under 256K, is the cheapest of the three by a wide margin. For 1M-context requests, the $1.101 Global input rate still undercuts Opus at $5.00 - you're paying $1.10 versus $5.00 per million tokens.
New accounts on Alibaba Cloud get 1 million tokens free per modality for 90 days on the International tier. The model ID is qwen3.6-plus or qwen3.6-plus-2026-04-02.
What the model is
Qwen3.6-Plus uses a hybrid architecture: linear attention layers combined with sparse mixture-of-experts routing. Parameter count is not disclosed. Context window is 1,000,000 tokens. The model handles text, images, and video in a single call - multimodal is native, not bolted on. It also has a hybrid thinking mode enabled by default, meaning it can do explicit chain-of-thought reasoning or skip it depending on task complexity.
Unlike the Qwen3.5 series, this one is proprietary. No weights, no self-hosting. Alibaba has moved the flagship line to API-only deployment. The model is integrated into their enterprise product Wukong and the consumer Qwen app.
Available on Alibaba Cloud (DashScope) and OpenRouter. The OpenRouter listing notes that Alibaba collects prompt and completion data on that route - worth knowing if you're sending sensitive content.
The benchmarks, with context
Alibaba's official chart compares Qwen3.6-Plus against Claude 4.5 Opus, not Claude 4.6 Opus. That matters because on Terminal-Bench 2.0, Claude 4.6 Opus scores 65.4 - above Qwen3.6-Plus at 61.6. The chart makes the terminal benchmark look like a Qwen win (61.6 vs 59.3) when the current model would flip that result.
The multimodal results are a different story. On MMMU, RealWorldQA, OmniDocBench, and Video-MME, Qwen3.6-Plus leads Claude 4.5 Opus by meaningful margins. Claude 4.6 Opus scores are not available for those multimodal benchmarks yet.

| Benchmark | Category | Qwen3.6-Plus | Claude 4.5 Opus | Winner |
|---|---|---|---|---|
| Terminal-Bench 2.0 | Agentic coding | 61.6 | 59.3 | Qwen (4.5 only) |
| SWE-bench Verified | Agentic coding | 78.8 | 80.9 | Claude |
| SWE-bench Pro | Agentic coding | 56.6 | 57.1 | Claude |
| SWE-bench Multilingual | Agentic coding | 73.8 | 77.5 | Claude |
| NL2Repo | Long-horizon coding | 37.9 | 43.2 | Claude |
| MMMU | Multimodal reasoning | 86.0 | 80.7 | Qwen |
| RealWorldQA | Image reasoning | 85.4 | 77.6 | Qwen |
| OmniDocBench v1.5 | Document recognition | 91.2 | 87.7 | Qwen |
| Video-MME | Video reasoning | 87.8 | 77.6 | Qwen |
* Terminal-Bench 2.0: Qwen3.6-Plus scores 61.6 vs Claude 4.5 Opus 59.3, but Claude 4.6 Opus scores 65.4 - which would reverse this result. Alibaba's chart does not include Claude 4.6 Opus. QwenClawBench and QwenWebBench are Alibaba's own proprietary benchmarks and are not shown here. See The Decoder's coverage for the reproduced benchmark chart.
On multimodal tasks, Qwen3.6-Plus competes with Claude Opus at a fraction of the cost. On pure agentic coding, Claude 4.6 Opus is still ahead. Heavy on document processing, image analysis, or video? The price difference makes this worth a real eval.
Cost at scale
Three monthly scenarios using Global tier pricing for Qwen3.6-Plus (requests under 256K tokens). All models use identical token counts.
| Scenario | Volume | Qwen3.6-Plus | GPT-5.4 | Claude Opus 4.6 |
|---|---|---|---|---|
| Code review | 50M in / 15M out | $39 | $350 | $625 |
| Document intelligence | 200M in / 30M out | $105 | $950 | $1,750 |
| Agentic workflow | 500M in / 100M out | $303 | $2,750 | $5,000 |
Qwen3.6-Plus: $0.276/M input + $1.651/M output (Global tier, under 256K). GPT-5.4: $2.50/M input + $15/M output. Claude Opus 4.6: $5/M input + $25/M output. Use the TokenCost calculator for your exact numbers.
For the agentic workflow scenario, Qwen3.6-Plus saves roughly $4,700/month versus Claude Opus 4.6 and $2,450/month versus GPT-5.4. At $5,000/month Claude spend, switching to Qwen3.6-Plus for tasks where quality holds up pays back quickly.
Where it makes sense
The clearest wins are multimodal at scale. Document OCR and analysis, image-heavy pipelines, video understanding - these are the workloads where Qwen3.6-Plus leads on benchmarks and where input tokens cost about 95% less. If you're running Claude Opus on these tasks today, the quality gap either doesn't exist or runs the other way.
The 1M context window at $0.276/M makes it interesting for long document summarization, as long as you stay under 256K per request to hold the lower rate. Go over 256K and you jump to $1.101/M - still cheaper than Opus, but less dramatic.
For pure agentic coding - the kind where the model navigates a repo, patches files, runs tests - Claude 4.6 Opus still has an edge on independent benchmarks. The gap on SWE-bench Verified is 80.9 vs 78.8, which might matter or not depending on the task. But on Terminal-Bench 2.0, Claude 4.6 Opus is meaningfully ahead (65.4 vs 61.6). Run evals on your actual workload before deciding.
Our read
Qwen3.6-Plus is the most cost-effective option at the frontier for multimodal workloads, and it's not close. At $0.276 versus $5.00 per million input tokens, you get roughly the same budget to run 18 Claude Opus calls or 100 Qwen3.6-Plus calls. For document processing, image analysis, and video tasks, the benchmarks support switching.
We'd be more cautious on agentic coding. The SWE-bench gap is small, but Alibaba presented their benchmark results against Claude 4.5 Opus specifically, and that choice is telling. The current Claude 4.6 Opus outperforms Qwen3.6-Plus on Terminal-Bench 2.0 and NL2Repo, which are better proxies for real-world coding agent work.
Privacy is the other variable. If you route through OpenRouter, Alibaba collects your prompts. For anything sensitive, route through Alibaba Cloud directly and check their data processing terms.