Where does the leaderboard data come from?

Benchmark data comes from Artificial Analysis, an independent LLM benchmarking service. Quality scores, output speed, and time-to-first-token are measured by their infrastructure and refreshed every 6 hours.

What is the Quality Index?

The Quality Index is a composite score from Artificial Analysis that aggregates performance across multiple benchmarks including reasoning, coding, math, and general knowledge tasks.

What does the Value ranking show?

The Value ranking combines quality score with pricing to show which models give you the most capability per dollar. A high-value model delivers strong benchmarks at a low price per token.

How often is the leaderboard updated?

Benchmark and pricing data refreshes every 6 hours. New models are added automatically as they appear in our data sources.

Can I see latency metrics?

Yes. The Speed ranking shows output tokens per second and time-to-first-token (TTFT) for each model, helping you choose models for latency-sensitive applications.

LLM Leaderboard

Live rankings by quality, speed, and value. Data from Artificial Analysis benchmarks.

Composite intelligence score from Artificial Analysis benchmarks

GPT-5.4

OpenAI

idx

Gemini 3.1 Pro

Google

idx

GPT-5.3 Codex

OpenAI

idx

Quality vs Price

Higher and left = better value. Hover for details.

OpenAI

Anthropic

Google

xAI

Meta

Mistral

DeepSeek

Amazon

NVIDIA

Cohere

Perplexity

Moonshot

Zhipu

MiniMax

#	Model	Provider	Quality Index	Speed	Input/1M	Context
1	Gemini 3.1 ProGoogle	Google	57idx	126 tok/s	$2	1.0M
2	GPT-5.4OpenAI	OpenAI	57idx	80 tok/s	$2.5	1.1M
3	GPT-5.3 CodexOpenAI	OpenAI	54idx	72 tok/s	$1.75	400K
4	Claude Opus 4.6 AdaptiveAnthropic	Anthropic	53idx	49 tok/s	$5	200K
5	Claude Sonnet 4.6 AdaptiveAnthropic	Anthropic	52idx	66 tok/s	$3	200K
6	GPT-5.2OpenAI	OpenAI	51idx	71 tok/s	$1.75	400K
7	GLM-5Zhipu	Zhipu	50idx	82 tok/s	$1	128K
8	Grok 4.20xAI	xAI	49idx	172 tok/s	$2	2.0M
9	Gemini 3 ProGoogle	Google	48idx	0 tok/s	$2	1.0M
10	GPT-5.1OpenAI	OpenAI	48idx	121 tok/s	$1.25	400K
11	Kimi K2.5Moonshot	Moonshot	47idx	35 tok/s	$0.6	262K
12	Claude Opus 4.6Anthropic	Anthropic	47idx	46 tok/s	$5	200K
13	Gemini 3 Flash ReasoningGoogle	Google	46idx	185 tok/s	$0.5	1.0M
14	GPT-5OpenAI	OpenAI	45idx	105 tok/s	$1.25	400K
15	Claude Sonnet 4.6Anthropic	Anthropic	44idx	48 tok/s	$3	200K
16	Claude Sonnet 4Anthropic	Anthropic	44idx	48 tok/s	$3	200K
17	Claude Opus 4.5Anthropic	Anthropic	43idx	49 tok/s	$5	200K
18	GPT-5 MediumOpenAI	OpenAI	42idx	85 tok/s	$1.25	400K
19	MiniMax M2.5MiniMax	MiniMax	42idx	204 tok/s	$0.3	128K
20	Grok 4xAI	xAI	42idx	0 tok/s	$3	2.0M
21	GPT-5 MiniOpenAI	OpenAI	41idx	90 tok/s	$0.25	400K
22	Grok 4.1 Fast ReasoningxAI	xAI	39idx	0 tok/s	$0.2	2.0M
23	o3OpenAI	OpenAI	38idx	107 tok/s	$2	200K
24	Claude 4.5 Haiku ReasoningAnthropic	Anthropic	37idx	135 tok/s	$1	200K
25	Nemotron 3 Super 120BNVIDIA	NVIDIA	36idx	150 tok/s	$0.3	1.0M
26	Nova 2.0 Pro ReasoningAmazon	Amazon	36idx	128 tok/s	$1.25	128K
27	Gemini 3 FlashGoogle	Google	35idx	191 tok/s	$0.5	1.0M
28	Gemini 2.5 ProGoogle	Google	35idx	127 tok/s	$1.25	1.0M
29	Gemini 3.1 Flash-LiteGoogle	Google	34idx	303 tok/s	$0.25	1.0M
30	o4 MiniOpenAI	OpenAI	33idx	154 tok/s	$1.1	200K
31	DeepSeek V3.2 (Chat)DeepSeek	DeepSeek	32idx	0 tok/s	$0.28	128K
32	Claude Haiku 4.5Anthropic	Anthropic	31idx	100 tok/s	$1	200K
33	o1OpenAI	OpenAI	31idx	116 tok/s	$15	200K
34	DeepSeek R1DeepSeek	DeepSeek	27idx	0 tok/s	$0.55	128K
35	GPT-5 NanoOpenAI	OpenAI	27idx	162 tok/s	$0.05	128K
36	GPT-4.1OpenAI	OpenAI	26idx	129 tok/s	$2	1.0M
37	o3 MiniOpenAI	OpenAI	26idx	202 tok/s	$1.1	200K
38	Grok 4.1 FastxAI	xAI	24idx	0 tok/s	$0.2	2.0M
39	GPT-4.1 MiniOpenAI	OpenAI	23idx	87 tok/s	$0.4	1.0M
40	Mistral Large 3Mistral	Mistral	23idx	61 tok/s	$0.5	262K
41	Gemini 2.5 FlashGoogle	Google	21idx	187 tok/s	$0.3	1.0M
42	Claude Haiku 3.5Anthropic	Anthropic	19idx	0 tok/s	$0.8	200K
43	Gemini 2.0 FlashGoogle	Google	19idx	0 tok/s	$0.1	1.0M
44	Llama 4 MaverickMeta	Meta	18idx	113 tok/s	$0.27	1.0M
45	Nova 2.0 LiteAmazon	Amazon	18idx	210 tok/s	$0.3	1.0M
46	GPT-4oOpenAI	OpenAI	17idx	142 tok/s	$2.5	128K
47	Sonar ProPerplexity	Perplexity	15idx	0 tok/s	$3	128K
48	Gemini 2.0 Flash-LiteGoogle	Google	15idx	0 tok/s	$0.075	1.0M
49	Llama 4 ScoutMeta	Meta	14idx	106 tok/s	$0.08	1.0M
50	Command ACohere	Cohere	14idx	68 tok/s	$2.5	128K
51	GPT-4.1 NanoOpenAI	OpenAI	13idx	112 tok/s	$0.1	1.0M
52	Gemini 2.5 Flash-LiteGoogle	Google	13idx	231 tok/s	$0.1	1.0M
53	GPT-4o MiniOpenAI	OpenAI	13idx	70 tok/s	$0.15	128K
54	Mistral Small 3.2Mistral	Mistral	10idx	159 tok/s	$0.1	128K

Rankings based on live benchmark data. Quality = composite intelligence index. Value = quality index / input cost per 1M tokens. Latency = time to first token.

Data by Artificial Analysis

How to Use the LLM Leaderboard

1
Choose a ranking metric
Switch between Quality, Speed, and Value tabs to rank models by the metric that matters most to your use case.
2
Filter by provider
Use the provider buttons to focus on specific vendors. Compare only OpenAI models, or pit Anthropic against Google.
3
Explore the scatter chart
The interactive quality-vs-price chart plots every model so you can visually identify the best value picks.

Why Use This Leaderboard

Three ranking modes — Quality, Speed, and Value — for different decision criteria
Benchmark data from Artificial Analysis, refreshed every 6 hours
Interactive scatter chart plotting quality against cost for visual comparison
Provider filtering to narrow the field to vendors you're evaluating
Includes output speed (tokens/sec) and time-to-first-token for latency planning

Common Use Cases

Model selection

Find the highest-quality model within your budget by sorting on the Value tab.

Latency optimization

Sort by Speed to find the fastest models for real-time applications like chat or autocomplete.

Benchmark tracking

Check back regularly to see how new model releases stack up against existing options.

Stakeholder reporting

Use the scatter chart to show leadership why a specific model offers the best quality-to-cost ratio.

Related Tools

Token Counter

Count tokens and see per-request costs.

API Pricing

Full pricing table for all models.

Compare Models

Side-by-side model comparison.

Cost Calculator

Estimate monthly API spend.

Frequently Asked Questions

Common questions about the LLM leaderboard

LLM Leaderboard

Quality vs Price

How to Use the LLM Leaderboard

Choose a ranking metric

Filter by provider

Explore the scatter chart

Why Use This Leaderboard

Common Use Cases

Model selection

Latency optimization

Benchmark tracking

Stakeholder reporting

Related Tools

Frequently Asked Questions