Skip to main content
TokenCost logoTokenCost
Model ReleaseMay 28, 2026·9 min read

Qwen3.7 Max ties Opus 4.7 on intelligence and beats it on agentic coding, at half the input price. The catch: closed weights and no images.

Alibaba announced Qwen3.7 Max on May 20 at $2.50 input and $7.50 output per million tokens, against $5/$25 on Claude Opus 4.7 and $5/$30 on GPT-5.5. It lands within a point of Opus on the Artificial Analysis Intelligence Index, edges ahead on SWE-Bench Pro and Terminal-Bench, and quadruples the old Qwen flagship's context to a full million tokens. The detail that makes it interesting for anyone running Claude Code: it speaks the Anthropic Messages API natively. Here is the real cost math, where GPT-5.5 still wins, and the two things the launch glossed over.

Abstract blue fiber-optic light strands on a dark background representing token throughput

Photo by S A on Unsplash

The list price is half of Opus on input, under a third on output

The standing rate is $2.50 input, $7.50 output, $0.25 cached input per million tokens. That cache number is a 90% discount and it matters for any agent that replays a stable system prompt. Watch the promo: OpenRouter is showing $1.25 input and $3.75 output right now, but it labels that a 50% launch discount. Quote the list rate if you are budgeting past the promo window, because nobody has said how long the discount holds.

ModelInput / 1MOutput / 1MCached inContext
Qwen3.7 Max (list)$2.50$7.50$0.251M
Qwen3.6 Max Preview$1.30$7.80~$0.13256K
Claude Opus 4.7$5.00$25.00$0.501M
GPT-5.5$5.00$30.00$0.501M
Gemini 3.1 Pro$2.00*$12.00*n/p1M

*Gemini 3.1 Pro jumps to $4 input and $18 output above a 200K-token prompt, so its edge over Qwen3.7 Max on input evaporates the moment you load a large repo. One more thing on the Opus column: Anthropic's 4.7 tokenizer counts up to about 35% more tokens than the previous Claude generation on identical text, so the effective Opus bill tends to run above its headline per-token rate. We dug into that in the Opus 4.7 tokenizer breakdown.

It wins the agentic boards and loses the intelligence race

Artificial Analysis puts Qwen3.7 Max at an Intelligence Index of 56.6, fifth overall and the top Chinese model. That is within a point of Opus 4.7 (57.3) and a few points behind GPT-5.5 (60.2). So it does not top the leaderboard. Where it does win is the agentic coding cluster: SWE-Bench Pro, Terminal-Bench, and MCP-Atlas all land above Opus 4.7, which is the model most teams were paying five dollars an input-million to beat.

BenchmarkQwen3.7 MaxOpus 4.7GPT-5.5
AA Intelligence Index56.657.360.2
GPQA Diamond92.491.393.6
SWE-Bench Pro60.657.3n/p
Terminal-Bench 2.069.765.4n/p
MCP-Atlas76.475.8n/p
HMMT Feb 202697.1n/pn/p

Read the agentic rows with one caveat. Alibaba reports a low attempt rate, around 48% on the AA-Omniscience hallucination test, which means the model abstains more than its rivals rather than answering and getting it wrong. That is good for factual safety and slightly flattering on accuracy-only charts. Alibaba also claims a single 35-hour autonomous run with 1,158 tool calls. That is the lab's own number, unverified by third parties, so treat it as a marketing figure rather than a guarantee.

The 1M context and the Claude Code drop-in are the real upgrade

The Qwen3.6 Max Preview capped out at 256K tokens, which kept it off whole-repo agentic work. Qwen3.7 Max takes that to a full million, matching Opus 4.7, GPT-5.5, and Gemini 3.1 Pro on context. Max output sits at 65,536 tokens, double the old 32K ceiling, so long-form report agents are less likely to truncate.

The shareable part is the harness story. Qwen3.7 Max exposes a native Anthropic Messages API endpoint, in preview, on top of the usual OpenAI-compatible one. Point your ANTHROPIC_BASE_URL at the DashScope endpoint and a Claude Code or OpenClaw setup runs Qwen3.7 Max with no wrapper. For teams that built their whole workflow around Claude Code but flinch at the Opus bill, that is the first time a frontier-class Chinese model slots in without a translation layer. The preview label is worth respecting, though: the compatibility could shift before it goes GA.

One limit to plan around. Qwen3.7 Max is text-only. No image input, despite some launch coverage implying otherwise. If your agent reads screenshots, design mocks, or PDFs as images, this is not a drop-in for that part of the pipeline, and you will keep a multimodal model in the loop for vision.

What four real workloads actually cost

List rates, no promo discount, so the numbers hold after the launch window closes. The whole-repo row pushes a 400K-token prompt through each model, which is where Gemini 3.1 Pro flips to its higher tier and where the 256K Qwen3.6 Max Preview could not play at all.

WorkloadQwen3.7 MaxOpus 4.7GPT-5.5Gemini 3.1 Pro
Coding agent step (50K in / 10K out)$0.200$0.500$0.550$0.220
Big-diff PR review (150K in / 8K out)$0.435$0.950$0.990$0.396
Whole-repo pass (400K in / 20K out)$1.150$2.500$2.600$1.960
1B tokens a month (70/30 blend)$4,000$11,000$12,500$5,000

Gemini 3.1 Pro is the only model that beats Qwen3.7 Max on any row, and it does it only on the big-diff review, a read-heavy prompt that barely generates anything, so its cheaper input token wins. Add output, or push the prompt past 200K where Gemini flips to its $4/$18 tier, and Qwen takes the lead back. The monthly figure for Gemini assumes most calls stay under that 200K line, so treat $5,000 as a floor. At the billion-token tier Opus 4.7 runs about 2.75 times the Qwen3.7 Max bill and GPT-5.5 a bit over three. If you only want the absolute cheapest credible option, DeepSeek V4-Pro at $0.435/$0.87 still owns that floor, and Qwen3.7 Max is not trying to beat it. It is going after the slot just under Opus, where the quality is close and the bill is a third the size.

When to route here, and when not to

High-volume agentic coding is the strongest case. If you run Claude Code or a similar harness and your bill comes from agent loops rather than a handful of high-stakes calls, Qwen3.7 Max gives you Opus-class agentic scores at a third of the cost, and the native Anthropic endpoint turns the switch into a base-URL change. The 1M context and 65K output ceiling stretch that to whole-repo work the 256K Qwen3.6 Max Preview could not touch, cheaper than every frontier rival bar DeepSeek.

The case weakens where raw intelligence is the bar. A 3.6-point Intelligence Index gap and the GPQA lead are real, so for research, hard reasoning, or anywhere one wrong answer is expensive, GPT-5.5 or Opus 4.7 still earn the premium. Two hard limits rule it out for some teams outright: it is text-only, so an image-reading agent still needs a multimodal model in the loop, and it is closed-weights, so self-hosting, air-gapped deployment, or a price hedge you control are all off the table.

And mind the promo if you are budgeting. The $1.25/$3.75 on OpenRouter is a launch discount, not the rate. Model your spend on $2.50/$7.50 so a quiet expiry does not double your bill overnight.

The pattern Alibaba is repeating

Qwen3.7 Max confirms the closed-weights turn that the 3.6 Max Preview started. Alibaba is keeping its flagship behind an API and pricing it to undercut the western frontier rather than the Chinese budget tier. The bet is that Opus-adjacent quality at a third of the price, plus a frictionless Claude Code path, pulls in the developers who were never going to self-host anyway. On the numbers here, that bet looks reasonable.

We track all of these on the TokenCost pricing page and the per-task math is in the calculator. For the prior closed Qwen flagship, see the Qwen3.6 Max Preview breakdown.

Sources