Skip to main content
TokenCost logoTokenCost
Model ReleaseJuly 4, 2026·8 min read

Meituan's LongCat-2.0 costs 75 cents a million and tops the open coder benchmarks. You can't download it yet, and nobody outside Meituan has rerun the scores.

The food-delivery giant open-sourced LongCat-2.0 on June 30 at $0.75 input and $2.95 output per million tokens, with a launch promo that halves it again. It claims to nose past GPT-5.5 on SWE-bench Pro, trained end to end on roughly 50,000 domestic Chinese chips with no Nvidia in the loop. Three things temper the headline: the MIT weights are still marked coming soon, every benchmark is Meituan's own, and DeepSeek already charges less per token. Here is what the price actually buys.

Dark abstract network of glowing nodes, evoking a large mixture-of-experts model

Photo by Jan Huber on Unsplash

Standard rate, and the promo sitting under it

LineStandardLaunch promo
Input (uncached)$0.75$0.30
Cached input$0.015$0.006
Output$2.95$1.20

Per million tokens, USD, from the official LongCat pricing docs. The promo column is a limited-time launch rate with no published end date. Cached reads are billed at $0.015 on pay-as-you-go; on the prepaid token packs, cache hits do not draw down your quota at all, which is where the "free cache" framing you may have seen comes from. Context window is 1M tokens.

Cheap against the West, not against its neighbor

The comparison Meituan wants you to make is against the American frontier, and on that board LongCat wins going away. Take a middling coding-agent month: 40M input tokens, 8M output, no cache. LongCat at standard rates runs $53.60. GPT-5.6 Luna, OpenAI's cheapest new tier, costs $88 for the same work. Claude Sonnet 5 at its introductory rate is $160. Sol, the GPT-5.6 flagship, is $440. If your yardstick is the models most Western teams actually reach for, the savings are not subtle.

ModelInput / output40M in / 8M out
LongCat-2.0 (promo)$0.30 / $1.20$21.60
DeepSeek V4-Pro$0.435 / $0.87$24.36
LongCat-2.0 (standard)$0.75 / $2.95$53.60
Kimi K2.7 Code$0.95 / $4.00$70.00
GPT-5.6 Luna$1 / $6$88.00
GLM-5.2$1.40 / $4.40$91.20
Claude Sonnet 5 (intro)$2 / $10$160.00
GPT-5.6 Sol$5 / $30$440.00

Now look at the row that spoils the story. DeepSeek V4-Pro sits at $0.435 input and $0.87 output, and finishes that same month at $24.36. That is 55% under LongCat's standard rate, on both lines, from a model that has been on independent leaderboards for months. LongCat only slips below DeepSeek when its launch promo is running, and even then the gap is a couple of dollars, not a category difference. So the accurate framing is narrow: LongCat is the cheapest way to reach GPT-5.5-class coding among models people trust, but it is not the cheapest token on the market, and the model that beats it on price is not exotic.

The promo is the part to treat with suspicion. A $0.30/$1.20 rate with no announced expiry is a customer-acquisition number, not a planning number. Meituan has every incentive to move it once the launch traffic settles. Budget on the $0.75/$2.95 standard rate and treat anything cheaper as a windfall while it lasts.

The scores are strong. They are also Meituan grading its own homework.

LongCat's benchmark page reads like a frontier coder. The number carrying the launch is SWE-bench Pro 59.5, which Meituan places just ahead of GPT-5.5 at 58.6 and well clear of Gemini 3.1 Pro at 54.2. The agentic and search results are the more interesting ones: BrowseComp 79.9 and RWSearch 78.8 point at a model tuned hard for tool use and long-horizon retrieval, not just single-turn code generation.

BenchmarkLongCat-2.0What it measures
SWE-bench Pro59.5Hard software fixes; Meituan puts GPT-5.5 at 58.6
SWE-bench Multilingual77.3Bug-fixing across languages
Terminal-Bench 2.170.8Shell and command-line agent tasks
BrowseComp79.9Web-browsing agent competence
RWSearch78.8Real-world search and retrieval
FORTE73.2Reasoning; ties Opus 4.6, trails GPT-5.5 at 77.8

Here is the caveat that has to travel with every one of those numbers: they are self-reported. Meituan ran them on its own harness, and it labels them as such. Artificial Analysis, the usual neutral referee, is not tracking LongCat yet, so there is no independent Intelligence Index to check against. The FORTE line is the useful tell that this is not pure marketing: Meituan published a result where its model ties Claude Opus 4.6 and loses to GPT-5.5, rather than sweeping the table. A vendor willing to print a loss is more credible on the wins. Still, credible is not verified.

At this price you do not have to take the chart on faith. A month of real work costs less than a takeout order, so the honest move is to push your own task set through it and read the diffs yourself before you believe SWE-bench Pro 59.5 means anything for your codebase.

It was already the most-used model on OpenRouter, in disguise

The most persuasive signal for LongCat is not on the benchmark page. For about two months before the June 30 reveal, the preview ran on OpenRouter under the cover name "Owl Alpha," and it quietly climbed to the top of the volume charts, roughly 10 trillion tokens a month, number one globally by usage while nobody knew whose model it was. It ranked first on the Hermes agent traffic and near the top for Claude Code and OpenClaw routing.

Stealth traffic is a better testimonial than a self-run eval, because those were real developers spending real money on latency and output quality without a brand steering them. It does not prove the SWE-bench number, but it does say the model held up under production load against every alternative on the same gateway. When the blind test picks you, that counts for more than the chart you drew yourself.

"Open weight" is a promise, not a download link, yet

This is where the coverage gets sloppy, so be precise. Meituan announced an MIT license and put up a HuggingFace repo, and a lot of writeups jumped straight to calling LongCat an open model you can run yourself. As of early July the weights are marked coming soon and there is nothing to download. What exists today is API access and a repo placeholder. The license is generous; the artifact is not out. If your plan depends on self-hosting for data-residency or cost reasons, that plan is on hold until the files actually land.

The architecture is the genuinely novel part. LongCat-2.0 is a 1.6-trillion-parameter mixture-of-experts that activates around 48B parameters per token, and it varies that active count dynamically between roughly 33B and 56B depending on how hard the request is, so easy prompts cost less compute than hard ones. The headline underneath the model, though, is where it was built: Meituan says it trained and serves LongCat end to end on about 50,000 domestic Chinese accelerators, with no Nvidia hardware in the pipeline. If that holds up, it is the first trillion-parameter-class model to make that claim, and the price you are being quoted reflects an inference stack that never touched an export-controlled GPU.

You reach it three ways right now: the LongCat API at longcat.ai, which speaks both the OpenAI and Anthropic wire formats so most SDKs work with a base-URL swap; OpenRouter, where the ex-Owl Alpha listing lives; and the web chat. There is no published rate-limit or regional-restriction sheet, so if you are outside China, test your access path before you design around it.

Who should switch, and who should stay put

If you are running a coding or agent workload on GPT-5.6 or Claude and watching the bill climb, LongCat is worth a serious pilot. The token price is a third to a sixth of what you are paying, the agentic scores suggest it was built for exactly that kind of tool-heavy work, and the OpenRouter track record says it survives production. Point a slice of non-critical traffic at it, diff the outputs against your incumbent for a week, and let the results decide.

If your only goal is the lowest possible token price, LongCat is the wrong pick and DeepSeek V4-Pro is the answer, at least until the promo math changes. And if you specifically need weights on your own hardware, LongCat cannot help you today no matter what the license says. Drop your real input-output mix into a cost calculator against the standard rate, not the promo, and you will see quickly whether the switch pays for the evaluation work it takes to trust a model nobody has independently benchmarked.

Sources