What is the cheapest LLM API in April 2026?

GPT-5 Nano and Qwen3.5-9B are both $0.05 per million input tokens as of April 2026. GPT-5 Nano is API-only. Qwen3.5-9B is open-weight under Apache 2.0 and can be self-hosted.

How much does GPT-5.4 cost per million tokens?

GPT-5.4 costs $2.50 per million input tokens and $15 per million output tokens at standard rates. Cached input drops to $1.25/M. Batch API reduces both by 50%. Above 272K context, input pricing doubles to $5/M.

What does Claude Mythos Preview cost?

Claude Mythos Preview costs $25 per million input tokens and $125 per million output tokens. It is not publicly available - access is restricted to Project Glasswing security partners including AWS, Apple, Google, Microsoft, and others.

Is DeepSeek V3.2 cheaper than GPT-5.4?

Yes. DeepSeek V3.2 costs $0.28/M input and $0.42/M output - roughly 9x cheaper on input and 36x cheaper on output compared to GPT-5.4 ($2.50/$15). On many benchmarks the quality difference is small, though GPT-5.4 is more consistent on complex multi-step reasoning tasks.

How much can prompt caching reduce LLM API costs?

Prompt caching cuts input costs by 50-90% depending on the provider. GPT-5.4 cached input is $1.25/M (vs $2.50 standard). Claude Opus 4.6 cached input is $0.50/M (vs $5 standard). For workloads with long system prompts or repeated context, caching is often the single highest-impact cost optimization.

ComparisonApril 10, 2026·9 min read

LLM API pricing in April 2026: from $0.05 to $125 per million tokens

The gap between the cheapest LLM API and the most expensive is now 2,500x on output tokens. This is the full picture as of April 2026 - every major model, every tier, and where the real value actually sits.

Dark server room with glowing cables representing LLM infrastructure and API pricing

Photo by Paul Teysen on Unsplash

Quick reference

Tier	Example	Input/1M	Output/1M
Ultra-cheap	GPT-5 Nano	$0.05	$0.40
Budget	DeepSeek V3.2	$0.28	$0.42
Mid-range	GPT-5.4	$2.50	$15
Premium	Claude Opus 4.6	$5	$25
Reasoning	o3-pro	$20	$80
Restricted	Claude Mythos Preview	$25	$125

Most teams pick a model, ship, and never reconsider. That works fine until you get a surprise invoice. The range of LLM API prices in April 2026 is bigger than it has ever been - not because the cheap models got more expensive, but because the expensive end keeps getting pushed further out.

GPT-5 Nano launched at $0.05 per million input tokens. Claude Mythos Preview costs $125 per million output tokens. Between them sits nearly every use case you can imagine. The interesting question is not which model is the "best" - it's which tier matches the complexity of your actual task.

Full pricing table, April 2026

Prices are standard API rates. Batch API and prompt caching reduce these further - see the cost multipliers section below.

Model	Provider	Input/1M	Output/1M	Context
GPT-5 Nano	OpenAI	$0.05	$0.40	128K
Qwen3.5-9B	Alibaba	$0.05	$0.15	262K
Llama 4 Scout	Meta	$0.08	$0.30	1M
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M
GPT-4.1 Nano	OpenAI	$0.10	$0.40	1M
Gemma 4 31B	Google	$0.14	$0.40	262K
Mistral Small 4	Mistral	$0.15	$0.60	256K
Llama 3.3 70B	Meta	$0.18	$0.18	128K
Grok 4.1 Fast	xAI	$0.20	$0.50	2M
Llama 4 Maverick	Meta	$0.27	$0.85	1M
DeepSeek V3.2	DeepSeek	$0.28	$0.42	128K
Gemini 2.5 Flash	Google	$0.30	$2.50	1M
GPT-5.4 Mini	OpenAI	$0.75	$4.50	400K
o3 / o4 Mini	OpenAI	$1.10	$4.40	200K
Gemini 3.1 Pro	Google	$2.00	$12	1M
Grok 4.20	xAI	$2.00	$6	2M
GPT-5.4	OpenAI	$2.50	$15	1M
Claude Sonnet 4.6	Anthropic	$3	$15	200K
Grok 4	xAI	$3	$15	2M
Claude Opus 4.6	Anthropic	$5	$25	200K
o3-pro	OpenAI	$20	$80	200K
Claude Mythos Preview*	Anthropic	$25	$125	1M
GPT-5.4 Pro	OpenAI	$30	$180	1M

* Claude Mythos Preview is restricted to Project Glasswing partners - not publicly available. Open-weight models (Llama, Qwen, Gemma) are priced via hosted inference providers. See our full pricing page for batch, caching, and context-length variants.

Under $0.20/M input: what you actually get

A year ago, $0.10/M input was cheap. Now it's the middle of the ultra-cheap tier. GPT-5 Nano at $0.05/M is surprisingly capable at structured tasks: classification, extraction, routing, simple summarization. It falls apart on anything requiring multi-step reasoning, nuanced writing, or domain knowledge at depth. That's fine - you probably don't need deep reasoning for a spam filter.

Qwen3.5-9B sits at the same price ($0.05/M) with one real difference: it's open-weight under Apache 2.0, so you can self-host or run it on any inference provider that supports it. Llama 4 Scout ($0.08/M) is open-weight too, with a 1M context window - genuinely useful for document processing at scale without paying context surcharges.

Grok 4.1 Fast at $0.20/M deserves a mention because it offers a 2M context window - the largest of any model at this price point. If you're doing long-context work on a budget, it's the right pick right now.

The ceiling here is real. At $0.05/M, you get fast, cheap processing for simple tasks. Push these models past their limits and you spend more on retries and error handling than you saved on token costs.

$0.20 to $1/M: where most production apps live

This is the densest part of the pricing spectrum. DeepSeek V3.2 at $0.28/M is the value story of late 2025 and into 2026 - a full-size 671B parameter model at a price that would have been impossible 18 months ago. Mistral Small 4 at $0.15/M is a 119B MoE with 6.5B active parameters, multimodal, with a configurable reasoning toggle. Llama 4 Maverick at $0.27/M runs on 1M context and benchmarks reasonably against older flagship models.

The comparison that matters here: DeepSeek V3.2 at $0.28/$0.42 per million tokens vs GPT-5.4 at $2.50/$15. On many benchmarks they are within a few percentage points of each other. On output costs specifically, DeepSeek is 36x cheaper per million tokens. For a product generating significant output volume - chatbots, document generation, summarization pipelines - that gap compounds quickly.

Where this tier struggles: agentic tasks that require many sequential reasoning steps, code generation for complex problems, and anything requiring consistent adherence to detailed instructions across long sessions. We've found DeepSeek V3.2 handles most single-turn tasks well but drifts on multi-turn agentic pipelines in ways GPT-5.4 and Claude Sonnet 4.6 do not - the failure mode tends to be dropping earlier context once the tool call chain gets long.

$1 to $5/M: the flagship cluster

GPT-5.4 ($2.50/M), Gemini 3.1 Pro ($2.00/M), Grok 4.20 ($2.00/M), and Claude Sonnet 4.6 ($3.00/M) compete in the same tier and score within a few points of each other on most leaderboards. The meaningful differences are in specialization: Sonnet 4.6 tends to perform best on coding tasks with complex tool use; Gemini 3.1 Pro has the strongest multimodal capabilities and ranked first on GPQA Diamond at 94.3% (per the Artificial Analysis leaderboard); Grok 4.20 offers a 2M context window at a lower price than the others.

Claude Opus 4.6 at $5/M input sits at the top of what most developers would call production-accessible. It's Anthropic's strongest general-purpose model you can actually call from a production app without restricting access. For complex agentic workflows - multi-step coding, autonomous research, long-context analysis - it produces fewer errors that require expensive retries than models at lower tiers.

One thing worth noting about this tier: o3 and o4 Mini from OpenAI ($1.10/$4.40) are reasoning models, not conventional completion models. They think before they answer. For the right task - math, code verification, structured reasoning - they punch well above their price. For simple text generation or fast inference, they are slow and offer no quality advantage over standard alternatives.

$20 to $180/M: what sits at the edge

Three models occupy the $20+ range: o3-pro ($20/$80), Claude Mythos Preview ($25/$125), and GPT-5.4 Pro ($30/$180). None of them are general-purpose production models.

o3-pro is OpenAI's extended reasoning model. The documentation recommends using it asynchronously - average time to first token is around 101 seconds. You pay $20/M input for much longer chain-of-thought computation than a standard model runs. For complex scientific or mathematical problems where you need verified results, it can be worth it. For anything real-time, it is not.

Claude Mythos Preview is restricted to Project Glasswing security partners - AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan, Nvidia. You cannot buy access. At $25/M input and $125/M output, it is five times more expensive than Claude Opus 4.6 and currently used exclusively for defensive security work. It found a 27-year-old OpenBSD zero-day for under $50 and identified 181 Firefox exploits in a single run, where Claude Opus 4.6 found 2. For the use cases it was built for, that gap is meaningful. Full breakdown here.

GPT-5.4 Pro at $30/$180 is the same base model as GPT-5.4 with extended thinking enabled. You pay 12x the base input rate for reasoning tokens. For most teams, the cleaner path is to use o3-pro for deep reasoning tasks and standard GPT-5.4 for everything else - rather than switching a production app to Pro mode and tripling the invoice.

What actually changes these numbers

The prices above are standard on-demand rates. Three mechanisms can substantially lower them:

Batch API

OpenAI, Anthropic, and Google all offer 50% discounts for non-real-time processing. Queue requests and wait up to 24 hours, and every price in the table above gets cut in half.

Prompt caching

For workloads with a long system prompt or repeated context, caching cuts input costs by 50-90%. GPT-5.4 cached input: $1.25/M (vs $2.50 standard). Claude Opus 4.6: $0.50/M (vs $5). The math changes dramatically for chatbots with long conversation histories.

Context surcharges

GPT-5.4 charges 2x input past 272K tokens ($5/M instead of $2.50/M). Gemini 3.1 Pro doubles its rate above 200K tokens. Long-context use cases cost 2x what the headline price suggests.

Combined, these can push effective costs well below the sticker price. A cached, batched call to GPT-5.4 works out to $0.625/M input - cheaper than a standard call to most budget models. The cost calculator handles all the variants if you want to run your own numbers.

When to move up a tier (and when not to)

The case for staying at the budget tier is stronger than it was a year ago. DeepSeek V3.2, Llama 4 Maverick, and Mistral Small 4 handle a lot of real production workloads competently. The case for moving to mid-range is specific: you need reliable multi-step reasoning, consistent tool use in agentic pipelines, or nuanced writing your users notice and compare.

Stick with the budget tier if:

Your task is classification, extraction, or structured output
You run high volume (100M+ tokens/month) and cost is the binding constraint
The quality gap between tiers is invisible to your end user
You can batch-process and wait hours for results

Move up a tier if:

You're building agentic workflows with multiple tool calls per run
Instruction-following failures are causing expensive retries
Users compare output quality directly (creative work, code review)
Your output is long-form and needs consistent quality end-to-end

One pattern worth avoiding: using a mid-range model as a catch-all when a cheap model handles 80% of requests equally well. Routing - sending simple queries to GPT-5 Nano and complex ones to GPT-5.4 or Claude Sonnet 4.6 - is more engineering work but can cut your average cost by 60-70% without visible quality regression. We wrote a full breakdown of that approach in how to cut your LLM API bill by 60% without changing models.

Sources

OpenAI API pricing - GPT-5.4, GPT-5 Nano, o3-pro, GPT-5.4 Pro rates
Anthropic pricing - Claude Opus 4.6, Sonnet 4.6, Mythos Preview rates
Google AI pricing - Gemini 3.1 Pro, Gemini 2.5 Flash series
xAI API pricing - Grok 4, Grok 4.20, Grok 4.1 Fast rates
DeepSeek API pricing - DeepSeek V3.2 rates
Mistral pricing - Mistral Small 4, Devstral rates
Artificial Analysis - independent benchmark scores and provider pricing aggregation

Compare all models Calculate your costs