LLM Leaderboard
Live rankings by quality, speed, and value. Data from Artificial Analysis benchmarks.
Composite intelligence score from Artificial Analysis benchmarks
Quality vs Price
Higher and left = better value. Hover for details.
| # | Model | Quality Index |
|---|---|---|
| 1 | Gemini 3.1 ProGoogle | 57idx |
| 2 | GPT-5.4OpenAI | 57idx |
| 3 | GPT-5.3 CodexOpenAI | 54idx |
| 4 | Claude Opus 4.6 AdaptiveAnthropic | 53idx |
| 5 | Claude Sonnet 4.6 AdaptiveAnthropic | 52idx |
| 6 | GPT-5.2OpenAI | 51idx |
| 7 | GLM-5Zhipu | 50idx |
| 8 | Grok 4.20xAI | 49idx |
| 9 | Gemini 3 ProGoogle | 48idx |
| 10 | GPT-5.1OpenAI | 48idx |
| 11 | Kimi K2.5Moonshot | 47idx |
| 12 | Claude Opus 4.6Anthropic | 47idx |
| 13 | Gemini 3 Flash ReasoningGoogle | 46idx |
| 14 | GPT-5OpenAI | 45idx |
| 15 | Claude Sonnet 4.6Anthropic | 44idx |
| 16 | Claude Sonnet 4Anthropic | 44idx |
| 17 | Claude Opus 4.5Anthropic | 43idx |
| 18 | GPT-5 MediumOpenAI | 42idx |
| 19 | MiniMax M2.5MiniMax | 42idx |
| 20 | Grok 4xAI | 42idx |
| 21 | GPT-5 MiniOpenAI | 41idx |
| 22 | Grok 4.1 Fast ReasoningxAI | 39idx |
| 23 | o3OpenAI | 38idx |
| 24 | Claude 4.5 Haiku ReasoningAnthropic | 37idx |
| 25 | Nemotron 3 Super 120BNVIDIA | 36idx |
| 26 | Nova 2.0 Pro ReasoningAmazon | 36idx |
| 27 | Gemini 3 FlashGoogle | 35idx |
| 28 | Gemini 2.5 ProGoogle | 35idx |
| 29 | Gemini 3.1 Flash-LiteGoogle | 34idx |
| 30 | o4 MiniOpenAI | 33idx |
| 31 | DeepSeek V3.2 (Chat)DeepSeek | 32idx |
| 32 | Claude Haiku 4.5Anthropic | 31idx |
| 33 | o1OpenAI | 31idx |
| 34 | DeepSeek R1DeepSeek | 27idx |
| 35 | GPT-5 NanoOpenAI | 27idx |
| 36 | GPT-4.1OpenAI | 26idx |
| 37 | o3 MiniOpenAI | 26idx |
| 38 | Grok 4.1 FastxAI | 24idx |
| 39 | GPT-4.1 MiniOpenAI | 23idx |
| 40 | Mistral Large 3Mistral | 23idx |
| 41 | Gemini 2.5 FlashGoogle | 21idx |
| 42 | Claude Haiku 3.5Anthropic | 19idx |
| 43 | Gemini 2.0 FlashGoogle | 19idx |
| 44 | Llama 4 MaverickMeta | 18idx |
| 45 | Nova 2.0 LiteAmazon | 18idx |
| 46 | GPT-4oOpenAI | 17idx |
| 47 | Sonar ProPerplexity | 15idx |
| 48 | Gemini 2.0 Flash-LiteGoogle | 15idx |
| 49 | Llama 4 ScoutMeta | 14idx |
| 50 | Command ACohere | 14idx |
| 51 | GPT-4.1 NanoOpenAI | 13idx |
| 52 | Gemini 2.5 Flash-LiteGoogle | 13idx |
| 53 | GPT-4o MiniOpenAI | 13idx |
| 54 | Mistral Small 3.2Mistral | 10idx |
Rankings based on live benchmark data. Quality = composite intelligence index. Value = quality index / input cost per 1M tokens. Latency = time to first token.
Data by Artificial AnalysisHow to Use the LLM Leaderboard
- 1
Choose a ranking metric
Switch between Quality, Speed, and Value tabs to rank models by the metric that matters most to your use case.
- 2
Filter by provider
Use the provider buttons to focus on specific vendors. Compare only OpenAI models, or pit Anthropic against Google.
- 3
Explore the scatter chart
The interactive quality-vs-price chart plots every model so you can visually identify the best value picks.
Why Use This Leaderboard
- Three ranking modes — Quality, Speed, and Value — for different decision criteria
- Benchmark data from Artificial Analysis, refreshed every 6 hours
- Interactive scatter chart plotting quality against cost for visual comparison
- Provider filtering to narrow the field to vendors you're evaluating
- Includes output speed (tokens/sec) and time-to-first-token for latency planning
Common Use Cases
Model selection
Find the highest-quality model within your budget by sorting on the Value tab.
Latency optimization
Sort by Speed to find the fastest models for real-time applications like chat or autocomplete.
Benchmark tracking
Check back regularly to see how new model releases stack up against existing options.
Stakeholder reporting
Use the scatter chart to show leadership why a specific model offers the best quality-to-cost ratio.
Related Tools
Frequently Asked Questions
Common questions about the LLM leaderboard