LLM API pricing in April 2026: from $0.05 to $125 per million tokens
The gap between the cheapest LLM API and the most expensive is now 2,500x on output tokens. This is the full picture as of April 2026 - every major model, every tier, and where the real value actually sits.

Photo by Paul Teysen on Unsplash
Quick reference
| Tier | Example | Input/1M | Output/1M |
|---|---|---|---|
| Ultra-cheap | GPT-5 Nano | $0.05 | $0.40 |
| Budget | DeepSeek V3.2 | $0.28 | $0.42 |
| Mid-range | GPT-5.4 | $2.50 | $15 |
| Premium | Claude Opus 4.6 | $5 | $25 |
| Reasoning | o3-pro | $20 | $80 |
| Restricted | Claude Mythos Preview | $25 | $125 |
Most teams pick a model, ship, and never reconsider. That works fine until you get a surprise invoice. The range of LLM API prices in April 2026 is bigger than it has ever been - not because the cheap models got more expensive, but because the expensive end keeps getting pushed further out.
GPT-5 Nano launched at $0.05 per million input tokens. Claude Mythos Preview costs $125 per million output tokens. Between them sits nearly every use case you can imagine. The interesting question is not which model is the "best" - it's which tier matches the complexity of your actual task.
Full pricing table, April 2026
Prices are standard API rates. Batch API and prompt caching reduce these further - see the cost multipliers section below.
| Model | Provider | Input/1M | Output/1M |
|---|---|---|---|
| GPT-5 Nano | OpenAI | $0.05 | $0.40 |
| Qwen3.5-9B | Alibaba | $0.05 | $0.15 |
| Llama 4 Scout | Meta | $0.08 | $0.30 |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | |
| GPT-4.1 Nano | OpenAI | $0.10 | $0.40 |
| Gemma 4 31B | $0.14 | $0.40 | |
| Mistral Small 4 | Mistral | $0.15 | $0.60 |
| Llama 3.3 70B | Meta | $0.18 | $0.18 |
| Grok 4.1 Fast | xAI | $0.20 | $0.50 |
| Llama 4 Maverick | Meta | $0.27 | $0.85 |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 |
| Gemini 2.5 Flash | $0.30 | $2.50 | |
| GPT-5.4 Mini | OpenAI | $0.75 | $4.50 |
| o3 / o4 Mini | OpenAI | $1.10 | $4.40 |
| Gemini 3.1 Pro | $2.00 | $12 | |
| Grok 4.20 | xAI | $2.00 | $6 |
| GPT-5.4 | OpenAI | $2.50 | $15 |
| Claude Sonnet 4.6 | Anthropic | $3 | $15 |
| Grok 4 | xAI | $3 | $15 |
| Claude Opus 4.6 | Anthropic | $5 | $25 |
| o3-pro | OpenAI | $20 | $80 |
| Claude Mythos Preview* | Anthropic | $25 | $125 |
| GPT-5.4 Pro | OpenAI | $30 | $180 |
* Claude Mythos Preview is restricted to Project Glasswing partners - not publicly available. Open-weight models (Llama, Qwen, Gemma) are priced via hosted inference providers. See our full pricing page for batch, caching, and context-length variants.
Under $0.20/M input: what you actually get
A year ago, $0.10/M input was cheap. Now it's the middle of the ultra-cheap tier. GPT-5 Nano at $0.05/M is surprisingly capable at structured tasks: classification, extraction, routing, simple summarization. It falls apart on anything requiring multi-step reasoning, nuanced writing, or domain knowledge at depth. That's fine - you probably don't need deep reasoning for a spam filter.
Qwen3.5-9B sits at the same price ($0.05/M) with one real difference: it's open-weight under Apache 2.0, so you can self-host or run it on any inference provider that supports it. Llama 4 Scout ($0.08/M) is open-weight too, with a 1M context window - genuinely useful for document processing at scale without paying context surcharges.
Grok 4.1 Fast at $0.20/M deserves a mention because it offers a 2M context window - the largest of any model at this price point. If you're doing long-context work on a budget, it's the right pick right now.
The ceiling here is real. At $0.05/M, you get fast, cheap processing for simple tasks. Push these models past their limits and you spend more on retries and error handling than you saved on token costs.
$0.20 to $1/M: where most production apps live
This is the densest part of the pricing spectrum. DeepSeek V3.2 at $0.28/M is the value story of late 2025 and into 2026 - a full-size 671B parameter model at a price that would have been impossible 18 months ago. Mistral Small 4 at $0.15/M is a 119B MoE with 6.5B active parameters, multimodal, with a configurable reasoning toggle. Llama 4 Maverick at $0.27/M runs on 1M context and benchmarks reasonably against older flagship models.
The comparison that matters here: DeepSeek V3.2 at $0.28/$0.42 per million tokens vs GPT-5.4 at $2.50/$15. On many benchmarks they are within a few percentage points of each other. On output costs specifically, DeepSeek is 36x cheaper per million tokens. For a product generating significant output volume - chatbots, document generation, summarization pipelines - that gap compounds quickly.
Where this tier struggles: agentic tasks that require many sequential reasoning steps, code generation for complex problems, and anything requiring consistent adherence to detailed instructions across long sessions. We've found DeepSeek V3.2 handles most single-turn tasks well but drifts on multi-turn agentic pipelines in ways GPT-5.4 and Claude Sonnet 4.6 do not - the failure mode tends to be dropping earlier context once the tool call chain gets long.
$1 to $5/M: the flagship cluster
GPT-5.4 ($2.50/M), Gemini 3.1 Pro ($2.00/M), Grok 4.20 ($2.00/M), and Claude Sonnet 4.6 ($3.00/M) compete in the same tier and score within a few points of each other on most leaderboards. The meaningful differences are in specialization: Sonnet 4.6 tends to perform best on coding tasks with complex tool use; Gemini 3.1 Pro has the strongest multimodal capabilities and ranked first on GPQA Diamond at 94.3% (per the Artificial Analysis leaderboard); Grok 4.20 offers a 2M context window at a lower price than the others.
Claude Opus 4.6 at $5/M input sits at the top of what most developers would call production-accessible. It's Anthropic's strongest general-purpose model you can actually call from a production app without restricting access. For complex agentic workflows - multi-step coding, autonomous research, long-context analysis - it produces fewer errors that require expensive retries than models at lower tiers.
One thing worth noting about this tier: o3 and o4 Mini from OpenAI ($1.10/$4.40) are reasoning models, not conventional completion models. They think before they answer. For the right task - math, code verification, structured reasoning - they punch well above their price. For simple text generation or fast inference, they are slow and offer no quality advantage over standard alternatives.
$20 to $180/M: what sits at the edge
Three models occupy the $20+ range: o3-pro ($20/$80), Claude Mythos Preview ($25/$125), and GPT-5.4 Pro ($30/$180). None of them are general-purpose production models.
o3-pro is OpenAI's extended reasoning model. The documentation recommends using it asynchronously - average time to first token is around 101 seconds. You pay $20/M input for much longer chain-of-thought computation than a standard model runs. For complex scientific or mathematical problems where you need verified results, it can be worth it. For anything real-time, it is not.
Claude Mythos Preview is restricted to Project Glasswing security partners - AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan, Nvidia. You cannot buy access. At $25/M input and $125/M output, it is five times more expensive than Claude Opus 4.6 and currently used exclusively for defensive security work. It found a 27-year-old OpenBSD zero-day for under $50 and identified 181 Firefox exploits in a single run, where Claude Opus 4.6 found 2. For the use cases it was built for, that gap is meaningful. Full breakdown here.
GPT-5.4 Pro at $30/$180 is the same base model as GPT-5.4 with extended thinking enabled. You pay 12x the base input rate for reasoning tokens. For most teams, the cleaner path is to use o3-pro for deep reasoning tasks and standard GPT-5.4 for everything else - rather than switching a production app to Pro mode and tripling the invoice.
What actually changes these numbers
The prices above are standard on-demand rates. Three mechanisms can substantially lower them:
Batch API
OpenAI, Anthropic, and Google all offer 50% discounts for non-real-time processing. Queue requests and wait up to 24 hours, and every price in the table above gets cut in half.
Prompt caching
For workloads with a long system prompt or repeated context, caching cuts input costs by 50-90%. GPT-5.4 cached input: $1.25/M (vs $2.50 standard). Claude Opus 4.6: $0.50/M (vs $5). The math changes dramatically for chatbots with long conversation histories.
Context surcharges
GPT-5.4 charges 2x input past 272K tokens ($5/M instead of $2.50/M). Gemini 3.1 Pro doubles its rate above 200K tokens. Long-context use cases cost 2x what the headline price suggests.
Combined, these can push effective costs well below the sticker price. A cached, batched call to GPT-5.4 works out to $0.625/M input - cheaper than a standard call to most budget models. The cost calculator handles all the variants if you want to run your own numbers.
When to move up a tier (and when not to)
The case for staying at the budget tier is stronger than it was a year ago. DeepSeek V3.2, Llama 4 Maverick, and Mistral Small 4 handle a lot of real production workloads competently. The case for moving to mid-range is specific: you need reliable multi-step reasoning, consistent tool use in agentic pipelines, or nuanced writing your users notice and compare.
Stick with the budget tier if:
- Your task is classification, extraction, or structured output
- You run high volume (100M+ tokens/month) and cost is the binding constraint
- The quality gap between tiers is invisible to your end user
- You can batch-process and wait hours for results
Move up a tier if:
- You're building agentic workflows with multiple tool calls per run
- Instruction-following failures are causing expensive retries
- Users compare output quality directly (creative work, code review)
- Your output is long-form and needs consistent quality end-to-end
One pattern worth avoiding: using a mid-range model as a catch-all when a cheap model handles 80% of requests equally well. Routing - sending simple queries to GPT-5 Nano and complex ones to GPT-5.4 or Claude Sonnet 4.6 - is more engineering work but can cut your average cost by 60-70% without visible quality regression. We wrote a full breakdown of that approach in how to cut your LLM API bill by 60% without changing models.
Sources
- OpenAI API pricing - GPT-5.4, GPT-5 Nano, o3-pro, GPT-5.4 Pro rates
- Anthropic pricing - Claude Opus 4.6, Sonnet 4.6, Mythos Preview rates
- Google AI pricing - Gemini 3.1 Pro, Gemini 2.5 Flash series
- xAI API pricing - Grok 4, Grok 4.20, Grok 4.1 Fast rates
- DeepSeek API pricing - DeepSeek V3.2 rates
- Mistral pricing - Mistral Small 4, Devstral rates
- Artificial Analysis - independent benchmark scores and provider pricing aggregation