How much does Tencent Hunyuan HY3 Preview cost per million tokens?

Two answers, and the gap between them is large enough to matter. On OpenRouter, HY3 Preview lists at $0.066 per million input tokens and $0.26 per million output tokens. On Tencent Cloud direct, the same model bills at roughly $0.17 input and $0.55 output per million in USD terms (RMB 1.2 input, RMB 4 output), with the higher rate reflecting Tencent's own GPU economics. Both are the cheapest frontier-class pricing currently offered for any 256K-context model.

How does HY3 Preview compare to GPT-5.5 and Claude Opus 4.7 on coding?

HY3 Preview scores 74.4% on SWE-Bench Verified per Tencent's published evaluation. GPT-5.5 sits at 88.7% and Claude Opus 4.7 at 87.6%. That is a 13-14 point gap on the benchmark that matters most for agentic coding, against models that cost 75-96 times as much per blended token. HY3 also lags DeepSeek V4-Pro (83.7%) and GLM-5 (77.8%). The cost story is real; so is the quality gap.

Is Hunyuan HY3 Preview open source?

No. Tencent released the weights under the Hy Community License Agreement, not an OSI-approved open source license. The license restricts commercial use for entities above a revenue threshold and prohibits redistribution of derived models. You can download and self-host, but the legal classification is open-weight, not open-source. Treat it like Llama's community license: usable in many contexts, but read the terms before shipping into production.

When does HY3 Preview's pricing actually beat the leaders?

On cost-per-SWE-Bench-point, HY3 on OpenRouter sits at roughly 0.17 cents per benchmark point per million tokens. Opus 4.7 sits at 12.6 cents. That is a 75x cost-per-quality advantage. The math wins for workloads where 74% verified-pass is acceptable: bulk code search, repo-wide grep with reasoning, draft generation that a human reviews, classification, data labeling. The math loses for autonomous agents where the 14-point coding gap compounds across many steps.

What are HY3 Preview's actual model specs?

HY3 Preview is a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters per forward pass (192 experts, top-8 routed). It uses Grouped-Query Attention with 64 query heads and 8 key/value heads, 80 transformer layers plus a multi-token-prediction layer, and a 120,832-token vocabulary. Context window is 256K tokens (262,144 exact on OpenRouter). Released April 22-24, 2026 on Hugging Face, GitHub, ModelScope, and GitCode.

Model ReleaseMay 11, 2026·9 min read

Tencent's Hunyuan HY3 Preview is the cheapest frontier-class model, and it's 14 points behind the leaders on coding

The headline most coverage led with was "$0.07 per million tokens, frontier benchmarks." Two things are wrong with that headline. The price is real but it has two versions: $0.066 on OpenRouter, $0.17 on Tencent Cloud direct. The benchmarks are not frontier; HY3 Preview lands at 74.4 percent on SWE-Bench Verified, which puts it 14 points behind GPT-5.5 and Opus 4.7. The interesting question is the one nobody is asking: at 75 to 96 times cheaper than Opus 4.7, what does a 14-point quality gap actually cost you?

Glowing 3D network of translucent cube nodes on black evoking a frontier AI model neural architecture

Photo by Shubham Dhage on Unsplash

HY3 Preview is the result of Tencent rebuilding its pretraining and RL infrastructure from scratch, now led by former OpenAI researcher Yao Shunyu. Weights dropped April 22-24, 2026 on Hugging Face, GitHub, ModelScope, and GitCode. The architecture is a 295B-parameter Mixture-of-Experts with 21B active, 256K context, and the license is custom (Hy Community License, not OSI-open). The cost numbers and the benchmark numbers tell different stories. Both stories are worth knowing before you route a workload to it.

The two prices, and why they disagree by 2.5x

Most of the launch coverage cited a single price. There are actually two, depending on where you call the model from. The discrepancy is large enough to flip routing decisions.

Provider	Input / 1M	Output / 1M	Notes
OpenRouter	$0.066	$0.26	262K ctx; routed via subsidised providers
Tencent Cloud (direct)	~$0.17 (RMB 1.2)	~$0.55 (RMB 4)	2-week free launch window in May
OpenRouter (free tier)	$0.00	$0.00	Rate-limited; not for production

The Tencent Cloud direct price reflects what it actually costs Tencent to serve the model on their own infrastructure in USD-equivalent terms. The OpenRouter price is roughly a third of that, which means some routed provider (likely a Chinese inference shop running on subsidised GPUs) is eating margin to win volume. The free tier exists but rate-limits hard; treat it as a sampling option, not a deployment target.

For comparison: GPT-5.5 lists at $5.00 input / $30.00 output. Opus 4.7 lists at $5.00 / $25.00. Gemini 3.1 Pro at $2.00 / $12.00 (under 200K). HY3 on OpenRouter is 76 times cheaper than GPT-5.5 on input and 115 times cheaper on output. On a blended 70/30 input/output ratio, the per-million-token cost is roughly $0.124 for HY3 OR, versus $11.00 for Opus 4.7. That is the headline cost gap.

Where HY3 actually sits on the leaderboard

Tencent published HY3 Preview's scores against a small set of public benchmarks. What they did not publish, notably, is AIME 2024 or AIME 2025 (which most frontier launches now include by default). The HLE score carries an asterisk in the official table. The scores that are clean look like this:

Benchmark	HY3 Preview	GPT-5.5	Opus 4.7	DeepSeek V4-Pro
SWE-Bench Verified	74.4%	88.7%	87.6%	83.7%
Terminal-Bench 2.0	54.4%	n/p	n/p	n/p
GPQA Diamond	87.2	93.6	94.2	89.1
MMLU-Pro	65.8	83.2	~82	78.9
LiveCodeBench v6	34.9	~78	~75	~68

On coding (the benchmark category that maps most directly to revenue for tools like Cursor, Copilot, and Claude Code), the gap is the largest. 14 points on SWE-Bench Verified is the difference between a model that finishes the task and one that partially finishes it. Multi-step coding agents amplify the gap: a 74% pass rate per subtask collapses to about 30% on a five-step plan, versus 53% for an 88% model. That math is unforgiving for autonomous coding work.

On GPQA Diamond (graduate science) HY3 is 6-7 points behind. On MMLU-Pro it is 16-17 points behind. The model that Tencent emphasises in its launch material is one that wins on cost, not on quality. The framing as "frontier-class" is editorial shorthand for "in the same context-length and capability category," not for "tied on accuracy."

Cost per benchmark point is the calculation that flips

Take the blended per-million-token cost and divide by SWE-Bench Verified score. The ratio measures how much you pay per percentage point of measured coding quality. The HY3 OpenRouter number is small enough to require an extra decimal.

Model	Blended $/1M (70/30)	SWE-Bench Verified	Cents per point per 1M
HY3 (OpenRouter)	$0.124	74.4%	0.17¢
HY3 (Tencent direct)	$0.284	74.4%	0.38¢
Kimi K2.6	$1.87	76.8%	2.4¢
DeepSeek V4-Pro (post-promo)	$2.26	83.7%	2.7¢
Claude Opus 4.7	$11.00	87.6%	12.6¢
GPT-5.5	$12.50	88.7%	14.1¢

HY3 on OpenRouter buys roughly 75 times more measured coding quality per dollar than Opus 4.7. The catch is that the absolute quality ceiling sits 14 points lower. The math only works when 74% is enough, or when you have a cheap verification step (compile, run tests, lint) that catches the misses. For agentic loops where each step has to clear a high bar before the next one fires, the cost-per-point advantage is a trap: a 26% failure rate compounds into a 70%+ failure rate over five steps.

Five workload shapes, run end to end

Bills computed on the same request shapes used in our GPT-5.5 vs Opus 4.7 vs Gemini 3.1 Pro comparison, extended with both HY3 prices and DeepSeek V4-Pro at its post-promo rate (the rate most forecasts should be using, since the launch promo expires May 31).

Workload	HY3 (OR)	HY3 (Tencent)	V4-Pro (post-promo)	Opus 4.7	GPT-5.5
Casual code (50K in / 10K out)	$0.006	$0.014	$0.12	$0.50	$0.55
Mid refactor (200K in / 50K out)	$0.026	$0.062	$0.52	$2.25	$2.50
Repo-scale agent (500K in / 100K out)	$0.059	$0.14	$1.22	$5.00	$5.50
1B tokens / month (70/30 blend)	$124	$284	$2,262	$11,000	$12,500
10B tokens / month (heavy use)	$1,240	$2,840	$22,620	$110,000	$125,000

At 10 billion tokens per month, the gap between HY3 on OpenRouter and GPT-5.5 is about $124,000 every 30 days, or $1.5 million annualised. The decision becomes: is a 14-point coding gap on SWE-Bench worth $1.5 million per year? For most production workloads that hide behind a verification step (CI, tests, code review), the answer is no. For autonomous coding agents without verification, the answer is yes, and then some.

The license is not what most people think it is

HY3 Preview ships with downloadable weights, and the launch coverage uses "open-weights" and "open-source" interchangeably. They are not the same thing. The actual license is the Tencent Hy Community License Agreement, which is a custom Tencent document, not OSI-approved. Two clauses matter for commercial deployment:

Commercial use threshold. Entities with monthly active users above a defined cap must obtain a separate commercial license from Tencent. The cap mirrors the Llama community license structure and excludes hyperscalers and large platforms from default permission.
Derived models. Restrictions apply to redistributing fine-tunes and on naming conventions for downstream models. Read the full license before merging weights into a product shipped under a different brand.

For internal tools, side projects, and most startups under the threshold, the license is permissive enough to treat as effectively open. For platforms above the threshold, the license is closer to a paid commercial agreement that happens to include the weights. Either way, it is not Apache 2.0, and the difference will matter at some point during a procurement review. The Kimi K2.6 modified-MIT and GLM-5 MIT licenses, by contrast, do not carry these restrictions.

Where to actually route HY3, in one paragraph each

For bulk content extraction, summarisation, classification, and data labeling at scale, HY3 is the cheapest credible option in the 256K-context tier. The 74% SWE-Bench score is a coding benchmark, not a general-purpose proxy; on summarisation and classification the gap to frontier models is smaller. Run a small accuracy eval on your task type before committing, and if the eval clears the bar, the 75x cost advantage is real.

For coding assistance where a human reviews every diff, HY3 is workable on simple edits and dangerous on autonomous multi-step plans. The 14-point SWE-Bench gap compounds. Use it for grep-with-reasoning, draft generation, or first-pass code review, and route the actual edits to Opus 4.7 or GPT-5.5. The cost-blended stack (cheap reads, expensive writes) lands somewhere between $2 and $5 per million blended tokens.

For autonomous agents (CodeBuddy, Claude Code, Codex, anything that closes a loop without human review), HY3 is not the right pick today. Tencent itself flags the 495-step CodeBuddy agent in its product material, but those flows run on internal Hunyuan tooling with custom verification. Outside that environment, the base-model error rate will eat any cost savings within a few iterations. Wait for HY3 Full (preview implies a follow-up), or use it as a cheap retrieval layer beneath a more expensive reasoning model.

Sources

Tencent-Hunyuan/Hy3-preview on GitHub - Architecture spec, license text, official benchmark tables
Hugging Face: tencent/Hy3-preview - 295B/21B MoE specs, SWE-Bench 74.4%, GPQA 87.2, MMLU-Pro 65.8
OpenRouter: tencent/hy3-preview - $0.066 input / $0.26 output per 1M, 262,144 context
Tencent: HY3 Preview launch announcement - Tencent Cloud pricing (RMB 1.2 / RMB 4 per 1M), product framing
Artificial Analysis: HY3 Preview - Intelligence Index ranking, third-party benchmark cross-check
SCMP: Tencent AI rebuild led by Yao Shunyu - Team context, organisational rebuild that produced HY3

Compare all model prices Calculate your API cost