Skip to main content
TokenCost logoTokenCost

TokenBlog

Model releases, pricing breakdowns, and practical guides for developers.

MiniMax M3 claims GPT-5.5-class coding for a tenth of the price. The benchmarks are self-reported and the cheap tier stops at 512K.
Model ReleaseJune 2, 2026·8 min read

MiniMax M3 claims GPT-5.5-class coding for a tenth of the price. The benchmarks are self-reported and the cheap tier stops at 512K.

MiniMax M3 went live on June 1 at $0.60 input and $2.40 output per million tokens, with a 50% launch promo cutting it to $0.30/$1.20 for the first week. It pairs a 1M context window and image-plus-video input with coding scores MiniMax says match GPT-5.5 on SWE-Bench Pro. Two catches: every benchmark is vendor-reported, and the cheap rate doubles once a request crosses 512K input. We work through the tiered rate card, the cost math against GPT-5.5, Opus 4.8, and DeepSeek V4-Pro, and where M3 actually wins.

GPT-5.5 has a pricing cliff at 272K tokens. Cross it and the whole request bills double.
GuideJune 1, 2026·8 min read

GPT-5.5 has a pricing cliff at 272K tokens. Cross it and the whole request bills double.

Cohere shipped a 218B Apache 2.0 model that runs on two H100s. The hosted rate matches Command A; the license is the actual news.
Model ReleaseMay 31, 2026·9 min read

Cohere shipped a 218B Apache 2.0 model that runs on two H100s. The hosted rate matches Command A; the license is the actual news.

Anthropic kept the Opus list price flat for the third release running. The headline is the new Fast mode: $10/$50 at 2.5x speed.
Model ReleaseMay 30, 2026·8 min read

Anthropic kept the Opus list price flat for the third release running. The headline is the new Fast mode: $10/$50 at 2.5x speed.

Qwen3.7 Max ties Opus 4.7 on intelligence and beats it on agentic coding, at half the input price. The catch: closed weights and no images.
Model ReleaseMay 28, 2026·9 min read

Qwen3.7 Max ties Opus 4.7 on intelligence and beats it on agentic coding, at half the input price. The catch: closed weights and no images.

Cursor Composer 2.5 lands one point behind Opus 4.7 on SWE-Bench and bills ten times less. The promo just ended.
ComparisonMay 27, 2026·9 min read

Cursor Composer 2.5 lands one point behind Opus 4.7 on SWE-Bench and bills ten times less. The promo just ended.

GPT-5.5 Pro lists at $30/$180 per million tokens, and it drops the cache discount that keeps the standard model cheap.
ComparisonMay 25, 2026·8 min read

GPT-5.5 Pro lists at $30/$180 per million tokens, and it drops the cache discount that keeps the standard model cheap.

Grok Build runs coding agents at $1 in, $2 out. The catch is that xAI published zero benchmarks.
Model ReleaseMay 24, 2026·7 min read

Grok Build runs coding agents at $1 in, $2 out. The catch is that xAI published zero benchmarks.

DeepSeek scrapped the May 31 price cliff. The 75% V4-Pro cut is the permanent rate now.
IndustryMay 23, 2026·8 min read

DeepSeek scrapped the May 31 price cliff. The 75% V4-Pro cut is the permanent rate now.

Google cut AI Ultra to $100 at I/O. Now all three AI giants charge the exact same $20, $100, $200 ladder.
ComparisonMay 22, 2026·8 min read

Google cut AI Ultra to $100 at I/O. Now all three AI giants charge the exact same $20, $100, $200 ladder.

Your Claude subscription stops paying for agents on June 15. The SDK and claude -p move to a separate credit billed at full API rates.
IndustryMay 21, 2026·9 min read

Your Claude subscription stops paying for agents on June 15. The SDK and claude -p move to a separate credit billed at full API rates.

Gemini 3.5 Flash costs 3x what Gemini 3 Flash did. Google priced it that way because it beats their own Pro model on agentic work.
Model ReleaseMay 20, 2026·10 min read

Gemini 3.5 Flash costs 3x what Gemini 3 Flash did. Google priced it that way because it beats their own Pro model on agentic work.

xAI quietly swapped Grok 4.3's weights for Colossus-2-trained ones and switched on Agent Mode. Same endpoint, same $1.25 input, new build.
Model ReleaseMay 19, 2026·9 min read

xAI quietly swapped Grok 4.3's weights for Colossus-2-trained ones and switched on Agent Mode. Same endpoint, same $1.25 input, new build.

SubQ 1M-Preview shipped 12M tokens of context, no public rate card, and a $8-vs-$2,600 cost story that does not reconcile.
Model ReleaseMay 18, 2026·10 min read

SubQ 1M-Preview shipped 12M tokens of context, no public rate card, and a $8-vs-$2,600 cost story that does not reconcile.

Qwen3.6 Max Preview is the first Qwen flagship Alibaba shipped closed. It took SWE-Bench Pro, lost SWE-Bench Verified by omission, and costs 4x what Qwen3.6 Plus does.
Model ReleaseMay 17, 2026·11 min read

Qwen3.6 Max Preview is the first Qwen flagship Alibaba shipped closed. It took SWE-Bench Pro, lost SWE-Bench Verified by omission, and costs 4x what Qwen3.6 Plus does.

Anthropic is raising at $950B, up 2.5x in three months. Every recent move on the Claude price sheet has been a cut. That gap is the whole story.
IndustryMay 16, 2026·10 min read

Anthropic is raising at $950B, up 2.5x in three months. Every recent move on the Claude price sheet has been a cut. That gap is the whole story.

MiniMax M2.7 lists at $0.30 input, $1.20 output. The GDPval-AA score is the highest open-weight number on the board. The SWE-Bench Verified column is empty.
Model ReleaseMay 15, 2026·11 min read

MiniMax M2.7 lists at $0.30 input, $1.20 output. The GDPval-AA score is the highest open-weight number on the board. The SWE-Bench Verified column is empty.

Qwen3 Coder Next is still $0.11 per million input. Kimi K2.6 costs 7x more, GPT-5.5 costs 41x more, and the benchmark gaps do not justify either.
ComparisonMay 14, 2026·10 min read

Qwen3 Coder Next is still $0.11 per million input. Kimi K2.6 costs 7x more, GPT-5.5 costs 41x more, and the benchmark gaps do not justify either.

Mercury 2 outputs at 788 tokens per second for $0.75 per million. The diffusion math turns frontier reasoning pricing into a rounding error.
Model ReleaseMay 13, 2026·11 min read

Mercury 2 outputs at 788 tokens per second for $0.75 per million. The diffusion math turns frontier reasoning pricing into a rounding error.

OpenAI's three new voice models price three different ways. Here is what an hour actually costs.
Model ReleaseMay 12, 2026·10 min read

OpenAI's three new voice models price three different ways. Here is what an hour actually costs.

Tencent's Hunyuan HY3 Preview is the cheapest frontier-class model, and it's 14 points behind the leaders on coding
Model ReleaseMay 11, 2026·9 min read

Tencent's Hunyuan HY3 Preview is the cheapest frontier-class model, and it's 14 points behind the leaders on coding

GPT-5.5 vs Opus 4.7 vs Gemini 3.1 Pro: the cheapest one depends entirely on whether you cross 200K tokens
ComparisonMay 10, 2026·10 min read

GPT-5.5 vs Opus 4.7 vs Gemini 3.1 Pro: the cheapest one depends entirely on whether you cross 200K tokens

DeepSeek V4-Pro's 75% promo ends May 31. After that, the price is 4x what most people are quoting.
ResearchMay 8, 2026·9 min read

DeepSeek V4-Pro's 75% promo ends May 31. After that, the price is 4x what most people are quoting.

GLM-4.7-flash sits at $0.07 input on AWS Bedrock and Vertex. Most coverage skipped this one.
ResearchMay 7, 2026·8 min read

GLM-4.7-flash sits at $0.07 input on AWS Bedrock and Vertex. Most coverage skipped this one.

Mistral Medium 3.5 charges 17x more than DeepSeek V4 Flash and loses the only benchmark they both report
ComparisonMay 6, 2026·9 min read

Mistral Medium 3.5 charges 17x more than DeepSeek V4 Flash and loses the only benchmark they both report

Three weeks of Opus 4.7 bills are in. The tokenizer change costs an extra 25 to 37 percent in production.
ResearchMay 5, 2026·10 min read

Three weeks of Opus 4.7 bills are in. The tokenizer change costs an extra 25 to 37 percent in production.

GPT-5.5 doubled the per-token price. The 40% efficiency claim does not get OpenAI back to even.
ResearchMay 4, 2026·10 min read

GPT-5.5 doubled the per-token price. The 40% efficiency claim does not get OpenAI back to even.

GLM-5.1 took SWE-Bench Pro at $1.40/M input. The catch is the small print, not the price.
Model ReleaseMay 3, 2026·8 min read

GLM-5.1 took SWE-Bench Pro at $1.40/M input. The catch is the small print, not the price.

Grok 4.3 quietly cut prices 40-60%, then shipped a voice cloner. The pricing math is now hard to ignore.
Model ReleaseMay 2, 2026·9 min read

Grok 4.3 quietly cut prices 40-60%, then shipped a voice cloner. The pricing math is now hard to ignore.

DeepSeek V4-Flash redrew the budget LLM tier. Here is where Haiku 4.5, GPT-5.4 Nano, and Gemini Flash-Lite now sit.
ComparisonMay 1, 2026·9 min read

DeepSeek V4-Flash redrew the budget LLM tier. Here is where Haiku 4.5, GPT-5.4 Nano, and Gemini Flash-Lite now sit.

Qwen3-Next-Thinking: the cheapest reasoning model under $1/M output
ResearchApril 30, 2026·8 min read

Qwen3-Next-Thinking: the cheapest reasoning model under $1/M output

OpenAI on AWS Bedrock: what it costs and what actually changed
ComparisonApril 30, 2026·7 min read

OpenAI on AWS Bedrock: what it costs and what actually changed

GitHub Copilot goes metered: what developers will actually pay per 1M tokens
ComparisonApril 29, 2026·8 min read

GitHub Copilot goes metered: what developers will actually pay per 1M tokens

DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7: what three frontier models actually cost
ComparisonApril 28, 2026·9 min read

DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7: what three frontier models actually cost

Gemini Deep Research: what each AI research report actually costs
GuideApril 24, 2026·8 min read

Gemini Deep Research: what each AI research report actually costs

Kimi K2.6: the fastest model in the top 5, at the lowest price
Model ReleaseApril 23, 2026·7 min read

Kimi K2.6: the fastest model in the top 5, at the lowest price

Claude Code vs OpenAI Codex: which coding agent actually costs less in 2026?
ComparisonApril 21, 2026·9 min read

Claude Code vs OpenAI Codex: which coding agent actually costs less in 2026?

Voice AI APIs in 2026: what Gemini TTS, Voxtral TTS, and OpenAI TTS actually cost per hour
ComparisonApril 20, 2026·7 min read

Voice AI APIs in 2026: what Gemini TTS, Voxtral TTS, and OpenAI TTS actually cost per hour

Gemini 3 Flash: $0.50 per million tokens, thinking on by default, and it actually beats Pro on agentic tasks
Model ReleaseApril 19, 2026·7 min read

Gemini 3 Flash: $0.50 per million tokens, thinking on by default, and it actually beats Pro on agentic tasks

Tokenmaxxing can inflate your LLM API bill by 10x. On Gemini and GPT-5.4, it's worse.
GuideApril 18, 2026·7 min read

Tokenmaxxing can inflate your LLM API bill by 10x. On Gemini and GPT-5.4, it's worse.

Claude Opus 4.7: $5 per million tokens - and what that actually means now
Model ReleaseApril 17, 2026·8 min read

Claude Opus 4.7: $5 per million tokens - and what that actually means now

Claude Code Routines: what each automated run actually costs
GuideApril 16, 2026·7 min read

Claude Code Routines: what each automated run actually costs

GPT-5.4 computer use: what a real agent task actually costs
GuideApril 14, 2026·8 min read

GPT-5.4 computer use: what a real agent task actually costs

DeepSeek V4: $0.30 per million tokens for a 1 trillion parameter model
Model ReleaseApril 13, 2026·8 min read

DeepSeek V4: $0.30 per million tokens for a 1 trillion parameter model

Chatbot Arena April 2026: Claude leads everything, Grok 4.20 has the cheapest output
ComparisonApril 12, 2026·7 min read

Chatbot Arena April 2026: Claude leads everything, Grok 4.20 has the cheapest output

OpenAI's new $100 ChatGPT Pro: what you actually get on Codex, and when the API wins anyway
ComparisonApril 11, 2026·7 min read

OpenAI's new $100 ChatGPT Pro: what you actually get on Codex, and when the API wins anyway

GPT-5.5 "Spud": release date, pricing forecast, and what we actually know right now
Model ReleaseApril 11, 2026·7 min read

GPT-5.5 "Spud": release date, pricing forecast, and what we actually know right now

LLM API pricing in April 2026: from $0.05 to $125 per million tokens
ComparisonApril 10, 2026·9 min read

LLM API pricing in April 2026: from $0.05 to $125 per million tokens

Meta Muse Spark: no API pricing, no open weights, and one area where it's best in the world
IndustryApril 9, 2026·8 min read

Meta Muse Spark: no API pricing, no open weights, and one area where it's best in the world

Project Glasswing and Claude Mythos Preview: the AI that found a 27-year-old bug for under $50
IndustryApril 8, 2026·9 min read

Project Glasswing and Claude Mythos Preview: the AI that found a 27-year-old bug for under $50

Is Claude Code getting worse? The data says something did change on March 8.
ResearchApril 8, 2026·8 min read

Is Claude Code getting worse? The data says something did change on March 8.

Claude Max subscribers using OpenClaw now pay API rates. Here's the math.
IndustryApril 8, 2026·7 min read

Claude Max subscribers using OpenClaw now pay API rates. Here's the math.

Microsoft MAI models: what MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 actually cost
Model ReleaseApril 7, 2026·7 min read

Microsoft MAI models: what MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 actually cost

Gemini 3.1 Pro: $2 input, tied for #1 on benchmarks, and 20% cheaper than GPT-5.4
Model ReleaseApril 7, 2026·8 min read

Gemini 3.1 Pro: $2 input, tied for #1 on benchmarks, and 20% cheaper than GPT-5.4

Qwen3.5-Omni: the pricing, the audio benchmarks, and whether the architecture hype is real
Model ReleaseApril 6, 2026·7 min read

Qwen3.5-Omni: the pricing, the audio benchmarks, and whether the architecture hype is real

Qwen3.6-Plus: $0.28 per million input tokens, and the benchmark comparison Alibaba chose not to lead with
Model ReleaseApril 6, 2026·7 min read

Qwen3.6-Plus: $0.28 per million input tokens, and the benchmark comparison Alibaba chose not to lead with

Claude Haiku 3 retires April 19: it's not just a model ID swap
GuideApril 5, 2026·7 min read

Claude Haiku 3 retires April 19: it's not just a model ID swap

Gemma 4 is out: $0.14 per million tokens for a 31B model scoring 89% on AIME
Model ReleaseApril 5, 2026·8 min read

Gemma 4 is out: $0.14 per million tokens for a 31B model scoring 89% on AIME

Reasoning models in 2026: $0.55 to $20 per million tokens, and when each tier makes sense
ComparisonApril 5, 2026·7 min read

Reasoning models in 2026: $0.55 to $20 per million tokens, and when each tier makes sense

Gemini Flex and Priority inference: how Google's new tiers work and what they cost
GuideApril 4, 2026·8 min read

Gemini Flex and Priority inference: how Google's new tiers work and what they cost

Google Gemini API billing caps are live: what developers need to know
IndustryApril 1, 2026·7 min read

Google Gemini API billing caps are live: what developers need to know

OpenAI Deep Research API: what it costs, and why o3-deep-research is 5x pricier than o3
GuideApril 1, 2026·7 min read

OpenAI Deep Research API: what it costs, and why o3-deep-research is 5x pricier than o3

OpenAI killed Sora. The math explains why.
IndustryMarch 31, 2026·9 min read

OpenAI killed Sora. The math explains why.

ARC-AGI-3: the benchmark no AI can crack, and what running it costs
ResearchMarch 31, 2026·8 min read

ARC-AGI-3: the benchmark no AI can crack, and what running it costs

Gemini 2.0 Flash is deprecated: what migration actually costs you
GuideMarch 30, 2026·8 min read

Gemini 2.0 Flash is deprecated: what migration actually costs you

Kimi K2.5 vs GPT-5.4: the model Cursor built on, and what it actually costs
ComparisonMarch 29, 2026·9 min read

Kimi K2.5 vs GPT-5.4: the model Cursor built on, and what it actually costs

OpenAI Codex pricing: API costs, container billing, and how it stacks up against Claude Code
ComparisonMarch 28, 2026·8 min read

OpenAI Codex pricing: API costs, container billing, and how it stacks up against Claude Code

How much does Claude Code actually cost per session?
GuideMarch 28, 2026·8 min read

How much does Claude Code actually cost per session?

Claude Mythos pricing: what Anthropic's leaked new model will cost developers
Model ReleaseMarch 27, 2026·7 min read

Claude Mythos pricing: what Anthropic's leaked new model will cost developers

Grok 4.20 Beta: $2 per million tokens, 2M context, and the lowest hallucination rate measured so far
Model ReleaseMarch 26, 2026·8 min read

Grok 4.20 Beta: $2 per million tokens, 2M context, and the lowest hallucination rate measured so far

Llama 4 Scout vs Maverick: API pricing, self-hosting costs, and which one to use
ComparisonMarch 26, 2026·9 min read

Llama 4 Scout vs Maverick: API pricing, self-hosting costs, and which one to use

DeepSeek V3.2 vs GPT-5.4: Is the 30x price gap worth it?
ComparisonMarch 25, 2026·8 min read

DeepSeek V3.2 vs GPT-5.4: Is the 30x price gap worth it?

Qwen3.5 Small: the 9B model that beats gpt-oss-120B on four benchmarks
Model ReleaseMarch 24, 2026·7 min read

Qwen3.5 Small: the 9B model that beats gpt-oss-120B on four benchmarks

Anthropic drops the 2x long-context surcharge: what Claude now costs at 1M tokens
IndustryMarch 24, 2026·7 min read

Anthropic drops the 2x long-context surcharge: what Claude now costs at 1M tokens

Xiaomi MiMo-V2-Pro: the trillion-parameter model that fooled everyone into thinking it was DeepSeek
Model ReleaseMarch 23, 2026·9 min read

Xiaomi MiMo-V2-Pro: the trillion-parameter model that fooled everyone into thinking it was DeepSeek

Gemini 3.1 Flash-Lite: $0.25 per million tokens, 1M context, and benchmark scores that beat Claude Haiku
Model ReleaseMarch 23, 2026·8 min read

Gemini 3.1 Flash-Lite: $0.25 per million tokens, 1M context, and benchmark scores that beat Claude Haiku

Mistral Small 4: $0.15 per million input tokens for a multimodal MoE model
Model ReleaseMarch 23, 2026·7 min read

Mistral Small 4: $0.15 per million input tokens for a multimodal MoE model

GPT-5.4 Mini vs Nano: pricing, benchmarks, and which one to use
ComparisonMarch 23, 2026·9 min read

GPT-5.4 Mini vs Nano: pricing, benchmarks, and which one to use

Claude 5 release date: what Anthropic has actually said
IndustryMarch 20, 2026·8 min read

Claude 5 release date: what Anthropic has actually said

How to cut your LLM API bill by 60% without changing models
GuideMarch 20, 2026·10 min read

How to cut your LLM API bill by 60% without changing models

The AI Price Index: How LLM costs dropped 300x in three years
ResearchMarch 20, 2026·12 min read

The AI Price Index: How LLM costs dropped 300x in three years

GLM-5 Turbo: the first model built for OpenClaw. Is it worth $1.20 per million tokens?
Model ReleaseMarch 16, 2026·6 min read

GLM-5 Turbo: the first model built for OpenClaw. Is it worth $1.20 per million tokens?

NVIDIA Nemotron 3 Super: Pricing, Benchmarks & What 12B Active Parameters Actually Gets You
Model ReleaseMarch 13, 2026·8 min read

NVIDIA Nemotron 3 Super: Pricing, Benchmarks & What 12B Active Parameters Actually Gets You

Gemini Embedding 2: Pricing, Limits, and How It Compares to OpenAI
Model ReleaseMarch 11, 2026·6 min read

Gemini Embedding 2: Pricing, Limits, and How It Compares to OpenAI

GPT-6 Release Date: What We Actually Know Right Now
Model ReleaseMarch 10, 2026·7 min read

GPT-6 Release Date: What We Actually Know Right Now

Anthropic Built a Marketplace. No Commission, Complicated Timing.
IndustryMarch 9, 2026·5 min read

Anthropic Built a Marketplace. No Commission, Complicated Timing.

OpenAI GPT-5.4: Pricing, Benchmarks & What Developers Actually Need to Know
Model ReleaseMarch 5, 2026·8 min read

OpenAI GPT-5.4: Pricing, Benchmarks & What Developers Actually Need to Know

GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro: Which One Should You Actually Use?
ComparisonMarch 6, 2026·10 min read

GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro: Which One Should You Actually Use?