Skip to main content
TokenCost logoTokenCost
Model ReleaseMay 11, 2026·9 min read

Tencent's Hunyuan HY3 Preview is the cheapest frontier-class model, and it's 14 points behind the leaders on coding

The headline most coverage led with was "$0.07 per million tokens, frontier benchmarks." Two things are wrong with that headline. The price is real but it has two versions: $0.066 on OpenRouter, $0.17 on Tencent Cloud direct. The benchmarks are not frontier; HY3 Preview lands at 74.4 percent on SWE-Bench Verified, which puts it 14 points behind GPT-5.5 and Opus 4.7. The interesting question is the one nobody is asking: at 75 to 96 times cheaper than Opus 4.7, what does a 14-point quality gap actually cost you?

Glowing 3D network of translucent cube nodes on black evoking a frontier AI model neural architecture

Photo by Shubham Dhage on Unsplash

HY3 Preview is the result of Tencent rebuilding its pretraining and RL infrastructure from scratch, now led by former OpenAI researcher Yao Shunyu. Weights dropped April 22-24, 2026 on Hugging Face, GitHub, ModelScope, and GitCode. The architecture is a 295B-parameter Mixture-of-Experts with 21B active, 256K context, and the license is custom (Hy Community License, not OSI-open). The cost numbers and the benchmark numbers tell different stories. Both stories are worth knowing before you route a workload to it.

The two prices, and why they disagree by 2.5x

Most of the launch coverage cited a single price. There are actually two, depending on where you call the model from. The discrepancy is large enough to flip routing decisions.

ProviderInput / 1MOutput / 1MNotes
OpenRouter$0.066$0.26262K ctx; routed via subsidised providers
Tencent Cloud (direct)~$0.17 (RMB 1.2)~$0.55 (RMB 4)2-week free launch window in May
OpenRouter (free tier)$0.00$0.00Rate-limited; not for production

The Tencent Cloud direct price reflects what it actually costs Tencent to serve the model on their own infrastructure in USD-equivalent terms. The OpenRouter price is roughly a third of that, which means some routed provider (likely a Chinese inference shop running on subsidised GPUs) is eating margin to win volume. The free tier exists but rate-limits hard; treat it as a sampling option, not a deployment target.

For comparison: GPT-5.5 lists at $5.00 input / $30.00 output. Opus 4.7 lists at $5.00 / $25.00. Gemini 3.1 Pro at $2.00 / $12.00 (under 200K). HY3 on OpenRouter is 76 times cheaper than GPT-5.5 on input and 115 times cheaper on output. On a blended 70/30 input/output ratio, the per-million-token cost is roughly $0.124 for HY3 OR, versus $11.00 for Opus 4.7. That is the headline cost gap.

Where HY3 actually sits on the leaderboard

Tencent published HY3 Preview's scores against a small set of public benchmarks. What they did not publish, notably, is AIME 2024 or AIME 2025 (which most frontier launches now include by default). The HLE score carries an asterisk in the official table. The scores that are clean look like this:

BenchmarkHY3 PreviewGPT-5.5Opus 4.7DeepSeek V4-Pro
SWE-Bench Verified74.4%88.7%87.6%83.7%
Terminal-Bench 2.054.4%n/pn/pn/p
GPQA Diamond87.293.694.289.1
MMLU-Pro65.883.2~8278.9
LiveCodeBench v634.9~78~75~68

On coding (the benchmark category that maps most directly to revenue for tools like Cursor, Copilot, and Claude Code), the gap is the largest. 14 points on SWE-Bench Verified is the difference between a model that finishes the task and one that partially finishes it. Multi-step coding agents amplify the gap: a 74% pass rate per subtask collapses to about 30% on a five-step plan, versus 53% for an 88% model. That math is unforgiving for autonomous coding work.

On GPQA Diamond (graduate science) HY3 is 6-7 points behind. On MMLU-Pro it is 16-17 points behind. The model that Tencent emphasises in its launch material is one that wins on cost, not on quality. The framing as "frontier-class" is editorial shorthand for "in the same context-length and capability category," not for "tied on accuracy."

Cost per benchmark point is the calculation that flips

Take the blended per-million-token cost and divide by SWE-Bench Verified score. The ratio measures how much you pay per percentage point of measured coding quality. The HY3 OpenRouter number is small enough to require an extra decimal.

ModelBlended $/1M (70/30)SWE-Bench VerifiedCents per point per 1M
HY3 (OpenRouter)$0.12474.4%0.17¢
HY3 (Tencent direct)$0.28474.4%0.38¢
Kimi K2.6$1.8776.8%2.4¢
DeepSeek V4-Pro (post-promo)$2.2683.7%2.7¢
Claude Opus 4.7$11.0087.6%12.6¢
GPT-5.5$12.5088.7%14.1¢

HY3 on OpenRouter buys roughly 75 times more measured coding quality per dollar than Opus 4.7. The catch is that the absolute quality ceiling sits 14 points lower. The math only works when 74% is enough, or when you have a cheap verification step (compile, run tests, lint) that catches the misses. For agentic loops where each step has to clear a high bar before the next one fires, the cost-per-point advantage is a trap: a 26% failure rate compounds into a 70%+ failure rate over five steps.

Five workload shapes, run end to end

Bills computed on the same request shapes used in our GPT-5.5 vs Opus 4.7 vs Gemini 3.1 Pro comparison, extended with both HY3 prices and DeepSeek V4-Pro at its post-promo rate (the rate most forecasts should be using, since the launch promo expires May 31).

WorkloadHY3 (OR)HY3 (Tencent)V4-Pro (post-promo)Opus 4.7GPT-5.5
Casual code (50K in / 10K out)$0.006$0.014$0.12$0.50$0.55
Mid refactor (200K in / 50K out)$0.026$0.062$0.52$2.25$2.50
Repo-scale agent (500K in / 100K out)$0.059$0.14$1.22$5.00$5.50
1B tokens / month (70/30 blend)$124$284$2,262$11,000$12,500
10B tokens / month (heavy use)$1,240$2,840$22,620$110,000$125,000

At 10 billion tokens per month, the gap between HY3 on OpenRouter and GPT-5.5 is about $124,000 every 30 days, or $1.5 million annualised. The decision becomes: is a 14-point coding gap on SWE-Bench worth $1.5 million per year? For most production workloads that hide behind a verification step (CI, tests, code review), the answer is no. For autonomous coding agents without verification, the answer is yes, and then some.

The license is not what most people think it is

HY3 Preview ships with downloadable weights, and the launch coverage uses "open-weights" and "open-source" interchangeably. They are not the same thing. The actual license is the Tencent Hy Community License Agreement, which is a custom Tencent document, not OSI-approved. Two clauses matter for commercial deployment:

  • Commercial use threshold. Entities with monthly active users above a defined cap must obtain a separate commercial license from Tencent. The cap mirrors the Llama community license structure and excludes hyperscalers and large platforms from default permission.
  • Derived models. Restrictions apply to redistributing fine-tunes and on naming conventions for downstream models. Read the full license before merging weights into a product shipped under a different brand.

For internal tools, side projects, and most startups under the threshold, the license is permissive enough to treat as effectively open. For platforms above the threshold, the license is closer to a paid commercial agreement that happens to include the weights. Either way, it is not Apache 2.0, and the difference will matter at some point during a procurement review. The Kimi K2.6 modified-MIT and GLM-5 MIT licenses, by contrast, do not carry these restrictions.

Where to actually route HY3, in one paragraph each

For bulk content extraction, summarisation, classification, and data labeling at scale, HY3 is the cheapest credible option in the 256K-context tier. The 74% SWE-Bench score is a coding benchmark, not a general-purpose proxy; on summarisation and classification the gap to frontier models is smaller. Run a small accuracy eval on your task type before committing, and if the eval clears the bar, the 75x cost advantage is real.

For coding assistance where a human reviews every diff, HY3 is workable on simple edits and dangerous on autonomous multi-step plans. The 14-point SWE-Bench gap compounds. Use it for grep-with-reasoning, draft generation, or first-pass code review, and route the actual edits to Opus 4.7 or GPT-5.5. The cost-blended stack (cheap reads, expensive writes) lands somewhere between $2 and $5 per million blended tokens.

For autonomous agents (CodeBuddy, Claude Code, Codex, anything that closes a loop without human review), HY3 is not the right pick today. Tencent itself flags the 495-step CodeBuddy agent in its product material, but those flows run on internal Hunyuan tooling with custom verification. Outside that environment, the base-model error rate will eat any cost savings within a few iterations. Wait for HY3 Full (preview implies a follow-up), or use it as a cheap retrieval layer beneath a more expensive reasoning model.

Sources