Nemotron 3 Ultra has no list price. The same open weights run from $0.37 to $0.60 a million depending on who hosts them.
NVIDIA released Nemotron 3 Ultra 550B on June 4, the largest open model it has shipped and, by Artificial Analysis' reckoning, the smartest open-weights model any US lab has put out. Here is the wrinkle for anyone pricing it: there is no rate card. The weights are open, so whoever hosts them picks the number, and right now that number runs from $0.37 to $0.60 per million input tokens depending on the provider, with a free endpoint thrown in. Below we walk the spread across hosts, run the workload math against Kimi K2.6 and DeepSeek V4-Pro, and get to the part NVIDIA buried: on raw intelligence it is still second to China.

Image source: NVIDIA
Why this model breaks the usual pricing question
With a closed model you ask one question: what does the lab charge? GPT-5.5 is $5 in and $30 out, full stop, because only OpenAI serves it. Nemotron 3 Ultra does not work that way. NVIDIA published the weights and a recommended NIM container, then let the market host it. By launch day more than two dozen providers were serving it, and they do not agree on a price. So the honest answer to “what does it cost” is “which host did you pick.”
That is the upside of open weights and the catch in the same breath. You get to shop the same model for the lowest rate, but you also have to, because the gap between the cheapest and the priciest paid host is real money at volume. We will start there.
The same model, six different bills
These are all serving identical weights. The only thing changing row to row is the provider's margin, hardware, and how hard they are courting launch traffic. Sorted by input price.
| Host | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| OpenRouter (free tier) | $0 | $0 | Rate-limited, for testing |
| Lowest launch rate | $0.37 | $1.08 | Cheapest seen across hosts at launch |
| OpenRouter / DeepInfra | $0.50 | $2.50 | 1M context |
| Together.ai | $0.60 | $3.60 | Cached input $0.20 |
| Cross-provider median | $0.60 | $2.60 | Per Artificial Analysis |
Output is where the hosts diverge hardest. The cheapest endpoints run it at $1.08 while Together asks $3.60 for the very same model, a 3.3x spread on the half of the bill that usually weighs most. Speed splits the field too: the fastest providers clear 400 tokens a second, while Artificial Analysis measures a cross-provider median closer to 140. If you are going to commit volume here, the host you route to matters more than the model you chose.
Stacked against the rest of the open shelf
Take the headline OpenRouter rate of $0.50/$2.50 and drop Nemotron 3 Ultra in among the open-weights models people actually compare it to. It is not the cheapest seat, and on output it is one of the dearer ones.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Llama 4 Scout | $0.08 | $0.30 | 1M |
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M |
| NVIDIA Nemotron 3 Super 120B | $0.30 | $0.80 | 1M |
| DeepSeek V4-Pro | $0.44 | $0.87 | 1M |
| NVIDIA Nemotron 3 Ultra 550B | $0.50 | $2.50 | 1M |
| Kimi K2.6 | $0.95 | $4.00 | 262K |
The reading depends on which half of the bill dominates your workload. On input Nemotron sits mid-pack, pricier than both DeepSeek tiers and NVIDIA's own Super, cheaper than Kimi. On output it jumps to $2.50, more than triple DeepSeek V4-Pro and miles past Super. That makes Nemotron 3 Ultra a poor fit for generation-heavy jobs and a defensible one for long-context reading tasks where input volume dwarfs output, the workload it was tuned for.
Four workloads, dollars at the door
Nemotron uses the $0.50/$2.50 OpenRouter rate here. Route to the cheapest launch endpoint and every Nemotron cell drops by roughly a quarter to a half. The monthly row assumes a 70/30 input-output split.
| Workload | Nemotron 3 Ultra | Kimi K2.6 | DeepSeek V4-Pro | Nemotron 3 Super |
|---|---|---|---|---|
| One-off question (10K in / 3K out) | $0.01 | $0.02 | $0.01 | $0.01 |
| Repo summarization (50K in / 15K out) | $0.06 | $0.11 | $0.03 | $0.03 |
| Long-context agent (200K in / 40K out) | $0.20 | $0.35 | $0.12 | $0.09 |
| 300M tokens/mo (70/30) | $330 | $560 | $170 | $135 |
At 300M tokens a month Nemotron 3 Ultra runs about $330, comfortably under Kimi K2.6's $560 but nearly double DeepSeek V4-Pro's $170 and well over its own little sibling at $135. The pitch is not that it is the cheapest open model, because it plainly is not. It is that it is the most capable American one, and you pay a premium over the Chinese options for that. Whether that premium is worth it comes down to the next table.
Best in the US, second to China
Artificial Analysis scores Nemotron 3 Ultra at 48 on its Intelligence Index, which makes it the top US open-weights model and ninth overall. The headline NVIDIA did not lead with: Kimi K2.6 sits at 54, so the open crown still belongs to Moonshot.
| Open-weights model | AA Intelligence Index | Origin |
|---|---|---|
| Kimi K2.6 | 54 | China |
| Nemotron 3 Ultra 550B | 48 | US |
| Gemma 4 31B | 39 | US |
| Nemotron 3 Super 120B | 36 | US |
| gpt-oss-120b | 33 | US |
On the component benchmarks NVIDIA published off the model card, the picture is strong across the board. These are the vendor's own numbers, so weight them accordingly, but they line up with where Artificial Analysis landed.
| Benchmark | Score | What it measures |
|---|---|---|
| MMLU-Pro | 86.8 | General knowledge and reasoning |
| GPQA (no tools) | 87.0 | Graduate-level science |
| LiveCodeBench v6 | 89.0 | Competitive coding |
| SWE-Bench Verified | 71.9 | Real GitHub issue fixes |
| RULER @ 1M | 94.7 | Long-context retrieval |
The SWE-Bench Verified figure is the single-harness headline; NVIDIA's blog frames the agentic range as 65 to 70 percent across five different scaffolds, so expect the lower end in a real setup. The RULER score is the one that earns the price premium on input-heavy work: holding 94.7 at a full million tokens is the behavior you are paying $2.50 output to get, because it means you can stuff a whole codebase or document set into context and trust the recall.
How a 550B model serves like a small one
Nemotron 3 Ultra carries 550 billion parameters but fires only 55 billion per token, a roughly 10-to-1 sparsity that keeps serving costs down for a model this size. The build is a hybrid: Mamba-2 state-space layers handle most of the sequence, a few attention layers sit in for the work attention does best, a latent Mixture-of-Experts layer adds capacity, and multi-token prediction layers speed up generation. NVIDIA pre-trained it in its NVFP4 4-bit format, which is also why it claims up to 5x the per-GPU throughput of a comparable BF16 model on Blackwell hardware. Native context is a clean 1 million tokens, though a given host may serve a shorter window, so check the provider's spec before you rely on the full million. The internal expert and layer counts floating around secondary blogs are not in NVIDIA's materials, so we are leaving them out rather than repeat a guess.
The license is open, but read which one
Nemotron 3 Ultra ships under the OpenMDW License 1.1, the Linux Foundation's permissive Open Model, Data and Weights license, and commercial use is allowed. Worth flagging because several launch write-ups got it wrong: this is not Apache 2.0, and it is not the older “NVIDIA Open Model License” that governed earlier Nemotron drops. The terms are friendlier than NVIDIA's previous license, but if you are shipping a product on these weights, read the actual model card rather than trust a summary, this one included.
So who should rent it
Nemotron 3 Ultra makes sense if you want the strongest open model that comes out of a US lab, you care about long-context recall, and your workload is heavier on reading than writing. In that lane it is genuinely good, and the free OpenRouter endpoint means you can confirm the fit before you spend anything. Shop the hosts before you commit volume, since the cheapest endpoint can land well below the $0.50 headline rate most people will quote you.
It is the wrong pick if cost per token is the only thing you are optimizing. DeepSeek V4-Pro is cheaper on both ends, NVIDIA's own Super 120B is cheaper still for a small drop in capability, and if you simply want the smartest open model regardless of provenance, Kimi K2.6 outscores it and is widely hosted. The decision really turns on how much the “American open weights” line matters to you, and on whether your traffic skews toward the input side where Nemotron's pricing is competitive rather than the output side where it is not.
Put Nemotron 3 Ultra next to the models you are weighing on the full pricing table, or push your own token mix through the calculator to see which host and which model land cheapest for your split.
Sources
- NVIDIA Technical Blog: Nemotron 3 Ultra - June 4 launch, architecture, throughput and SWE-Bench harness range
- Hugging Face: Nemotron 3 Ultra 550B model card - 550B/55B params, 1M context, benchmark table, OpenMDW 1.1 license
- Artificial Analysis: Nemotron 3 Ultra - Intelligence Index 48, cross-provider median price, speed measurements
- OpenRouter: Nemotron 3 Ultra providers - $0.50/$2.50 listing, free endpoint, per-host rates
- Together.ai: Nemotron 3 Ultra - $0.60/$3.60 with $0.20 cached input, 1M context
- Artificial Analysis: launch analysis - Kimi K2.6 at 54 vs Nemotron at 48, open-weights field comparison