Skip to main content
TokenCost logoTokenCost

LLM Leaderboard

Live rankings by quality, speed, and value. Data from Artificial Analysis benchmarks.

Composite intelligence score from Artificial Analysis benchmarks

#2
GPT-5.4
OpenAI
57
idx
#1
Gemini 3.1 Pro
Google
57
idx
#3
GPT-5.3 Codex
OpenAI
54
idx

Quality vs Price

Higher and left = better value. Hover for details.

$0.1$0.5$1$2$5$101020304050Input Price / 1M tokens (log)Quality Index
OpenAI
Anthropic
Google
xAI
Meta
Mistral
DeepSeek
Amazon
NVIDIA
Cohere
Perplexity
Moonshot
Zhipu
MiniMax
#ModelQuality Index
1Gemini 3.1 ProGoogle
57idx
2GPT-5.4OpenAI
57idx
3GPT-5.3 CodexOpenAI
54idx
4Claude Opus 4.6 AdaptiveAnthropic
53idx
5Claude Sonnet 4.6 AdaptiveAnthropic
52idx
6GPT-5.2OpenAI
51idx
7GLM-5Zhipu
50idx
8Grok 4.20xAI
49idx
9Gemini 3 ProGoogle
48idx
10GPT-5.1OpenAI
48idx
11Kimi K2.5Moonshot
47idx
12Claude Opus 4.6Anthropic
47idx
13Gemini 3 Flash ReasoningGoogle
46idx
14GPT-5OpenAI
45idx
15Claude Sonnet 4.6Anthropic
44idx
16Claude Sonnet 4Anthropic
44idx
17Claude Opus 4.5Anthropic
43idx
18GPT-5 MediumOpenAI
42idx
19MiniMax M2.5MiniMax
42idx
20Grok 4xAI
42idx
21GPT-5 MiniOpenAI
41idx
22Grok 4.1 Fast ReasoningxAI
39idx
23o3OpenAI
38idx
24Claude 4.5 Haiku ReasoningAnthropic
37idx
25Nemotron 3 Super 120BNVIDIA
36idx
26Nova 2.0 Pro ReasoningAmazon
36idx
27Gemini 3 FlashGoogle
35idx
28Gemini 2.5 ProGoogle
35idx
29Gemini 3.1 Flash-LiteGoogle
34idx
30o4 MiniOpenAI
33idx
31DeepSeek V3.2 (Chat)DeepSeek
32idx
32Claude Haiku 4.5Anthropic
31idx
33o1OpenAI
31idx
34DeepSeek R1DeepSeek
27idx
35GPT-5 NanoOpenAI
27idx
36GPT-4.1OpenAI
26idx
37o3 MiniOpenAI
26idx
38Grok 4.1 FastxAI
24idx
39GPT-4.1 MiniOpenAI
23idx
40Mistral Large 3Mistral
23idx
41Gemini 2.5 FlashGoogle
21idx
42Claude Haiku 3.5Anthropic
19idx
43Gemini 2.0 FlashGoogle
19idx
44Llama 4 MaverickMeta
18idx
45Nova 2.0 LiteAmazon
18idx
46GPT-4oOpenAI
17idx
47Sonar ProPerplexity
15idx
48Gemini 2.0 Flash-LiteGoogle
15idx
49Llama 4 ScoutMeta
14idx
50Command ACohere
14idx
51GPT-4.1 NanoOpenAI
13idx
52Gemini 2.5 Flash-LiteGoogle
13idx
53GPT-4o MiniOpenAI
13idx
54Mistral Small 3.2Mistral
10idx

Rankings based on live benchmark data. Quality = composite intelligence index. Value = quality index / input cost per 1M tokens. Latency = time to first token.

Data by Artificial Analysis

How to Use the LLM Leaderboard

  1. 1

    Choose a ranking metric

    Switch between Quality, Speed, and Value tabs to rank models by the metric that matters most to your use case.

  2. 2

    Filter by provider

    Use the provider buttons to focus on specific vendors. Compare only OpenAI models, or pit Anthropic against Google.

  3. 3

    Explore the scatter chart

    The interactive quality-vs-price chart plots every model so you can visually identify the best value picks.

Why Use This Leaderboard

  • Three ranking modes — Quality, Speed, and Value — for different decision criteria
  • Benchmark data from Artificial Analysis, refreshed every 6 hours
  • Interactive scatter chart plotting quality against cost for visual comparison
  • Provider filtering to narrow the field to vendors you're evaluating
  • Includes output speed (tokens/sec) and time-to-first-token for latency planning

Common Use Cases

Model selection

Find the highest-quality model within your budget by sorting on the Value tab.

Latency optimization

Sort by Speed to find the fastest models for real-time applications like chat or autocomplete.

Benchmark tracking

Check back regularly to see how new model releases stack up against existing options.

Stakeholder reporting

Use the scatter chart to show leadership why a specific model offers the best quality-to-cost ratio.

Related Tools

Frequently Asked Questions

Common questions about the LLM leaderboard