Skip to main content
TokenCost logoTokenCost
Model ReleaseApril 6, 2026·7 min read

Qwen3.6-Plus: $0.28 per million input tokens, and the benchmark comparison Alibaba chose not to lead with

Released April 2, 2026. At $0.276 per million input tokens globally, it costs 18x less than Claude Opus 4.6. On multimodal tasks it genuinely outperforms. On agentic coding, the current Claude Opus 4.6 still wins - which is why Alibaba's benchmark chart compared against the older 4.5.

Qwen3.6-Plus model release benchmark chart showing performance across agentic coding and multimodal tasks

Image source: Qwen Blog

  • Released April 2. $0.276/M input globally (Global tier, under 256K). 18x less than Claude Opus 4.6.
  • Genuinely better than Claude 4.5 Opus on documents, images, and video. Claude 4.6 Opus still wins on agentic coding.
  • Alibaba's benchmark chart used Claude 4.5, not 4.6. OpenRouter collects your prompt data.

Pricing by region and context band

The API pricing depends on where you call from and how much context you use per request. Alibaba bills differently for requests under 256K tokens versus longer ones.

RegionContext bandInput / 1MOutput / 1M
Global (US/EU)0 - 256K tokens$0.276$1.651
Global (US/EU)256K - 1M tokens$1.101$6.602
International (Singapore)0 - 256K tokens$0.50$3.00
International (Singapore)256K - 1M tokens$2.00$6.00

For context: Claude Opus 4.6 costs $5.00/M input and $25.00/M output with no context band tiering. GPT-5.4 runs $2.50/M input and $15.00/M output. Qwen3.6-Plus on the Global tier, for requests under 256K, is the cheapest of the three by a wide margin. For 1M-context requests, the $1.101 Global input rate still undercuts Opus at $5.00 - you're paying $1.10 versus $5.00 per million tokens.

New accounts on Alibaba Cloud get 1 million tokens free per modality for 90 days on the International tier. The model ID is qwen3.6-plus or qwen3.6-plus-2026-04-02.

What the model is

Qwen3.6-Plus uses a hybrid architecture: linear attention layers combined with sparse mixture-of-experts routing. Parameter count is not disclosed. Context window is 1,000,000 tokens. The model handles text, images, and video in a single call - multimodal is native, not bolted on. It also has a hybrid thinking mode enabled by default, meaning it can do explicit chain-of-thought reasoning or skip it depending on task complexity.

Unlike the Qwen3.5 series, this one is proprietary. No weights, no self-hosting. Alibaba has moved the flagship line to API-only deployment. The model is integrated into their enterprise product Wukong and the consumer Qwen app.

Available on Alibaba Cloud (DashScope) and OpenRouter. The OpenRouter listing notes that Alibaba collects prompt and completion data on that route - worth knowing if you're sending sensitive content.

The benchmarks, with context

Alibaba's official chart compares Qwen3.6-Plus against Claude 4.5 Opus, not Claude 4.6 Opus. That matters because on Terminal-Bench 2.0, Claude 4.6 Opus scores 65.4 - above Qwen3.6-Plus at 61.6. The chart makes the terminal benchmark look like a Qwen win (61.6 vs 59.3) when the current model would flip that result.

The multimodal results are a different story. On MMMU, RealWorldQA, OmniDocBench, and Video-MME, Qwen3.6-Plus leads Claude 4.5 Opus by meaningful margins. Claude 4.6 Opus scores are not available for those multimodal benchmarks yet.

Qwen3.6-Plus benchmark chart vs Claude 4.5 Opus: Terminal-Bench, SWE-bench, MMMU, and multimodal tasks
BenchmarkCategoryQwen3.6-PlusClaude 4.5 OpusWinner
Terminal-Bench 2.0Agentic coding61.659.3Qwen (4.5 only)
SWE-bench VerifiedAgentic coding78.880.9Claude
SWE-bench ProAgentic coding56.657.1Claude
SWE-bench MultilingualAgentic coding73.877.5Claude
NL2RepoLong-horizon coding37.943.2Claude
MMMUMultimodal reasoning86.080.7Qwen
RealWorldQAImage reasoning85.477.6Qwen
OmniDocBench v1.5Document recognition91.287.7Qwen
Video-MMEVideo reasoning87.877.6Qwen

* Terminal-Bench 2.0: Qwen3.6-Plus scores 61.6 vs Claude 4.5 Opus 59.3, but Claude 4.6 Opus scores 65.4 - which would reverse this result. Alibaba's chart does not include Claude 4.6 Opus. QwenClawBench and QwenWebBench are Alibaba's own proprietary benchmarks and are not shown here. See The Decoder's coverage for the reproduced benchmark chart.

On multimodal tasks, Qwen3.6-Plus competes with Claude Opus at a fraction of the cost. On pure agentic coding, Claude 4.6 Opus is still ahead. Heavy on document processing, image analysis, or video? The price difference makes this worth a real eval.

Cost at scale

Three monthly scenarios using Global tier pricing for Qwen3.6-Plus (requests under 256K tokens). All models use identical token counts.

ScenarioVolumeQwen3.6-PlusGPT-5.4Claude Opus 4.6
Code review50M in / 15M out$39$350$625
Document intelligence200M in / 30M out$105$950$1,750
Agentic workflow500M in / 100M out$303$2,750$5,000

Qwen3.6-Plus: $0.276/M input + $1.651/M output (Global tier, under 256K). GPT-5.4: $2.50/M input + $15/M output. Claude Opus 4.6: $5/M input + $25/M output. Use the TokenCost calculator for your exact numbers.

For the agentic workflow scenario, Qwen3.6-Plus saves roughly $4,700/month versus Claude Opus 4.6 and $2,450/month versus GPT-5.4. At $5,000/month Claude spend, switching to Qwen3.6-Plus for tasks where quality holds up pays back quickly.

Where it makes sense

The clearest wins are multimodal at scale. Document OCR and analysis, image-heavy pipelines, video understanding - these are the workloads where Qwen3.6-Plus leads on benchmarks and where input tokens cost about 95% less. If you're running Claude Opus on these tasks today, the quality gap either doesn't exist or runs the other way.

The 1M context window at $0.276/M makes it interesting for long document summarization, as long as you stay under 256K per request to hold the lower rate. Go over 256K and you jump to $1.101/M - still cheaper than Opus, but less dramatic.

For pure agentic coding - the kind where the model navigates a repo, patches files, runs tests - Claude 4.6 Opus still has an edge on independent benchmarks. The gap on SWE-bench Verified is 80.9 vs 78.8, which might matter or not depending on the task. But on Terminal-Bench 2.0, Claude 4.6 Opus is meaningfully ahead (65.4 vs 61.6). Run evals on your actual workload before deciding.

Our read

Qwen3.6-Plus is the most cost-effective option at the frontier for multimodal workloads, and it's not close. At $0.276 versus $5.00 per million input tokens, you get roughly the same budget to run 18 Claude Opus calls or 100 Qwen3.6-Plus calls. For document processing, image analysis, and video tasks, the benchmarks support switching.

We'd be more cautious on agentic coding. The SWE-bench gap is small, but Alibaba presented their benchmark results against Claude 4.5 Opus specifically, and that choice is telling. The current Claude 4.6 Opus outperforms Qwen3.6-Plus on Terminal-Bench 2.0 and NL2Repo, which are better proxies for real-world coding agent work.

Privacy is the other variable. If you route through OpenRouter, Alibaba collects your prompts. For anything sensitive, route through Alibaba Cloud directly and check their data processing terms.

Sources