LLM Leaderboard

MMLU Benchmarks

Comprehensive testing across 57 subjects including mathematics, history, law, and medicine to evaluate LLM knowledge breadth.

GPQA Evaluation

Graduate-level expert knowledge evaluation designed to test advanced reasoning in specialized domains.

SWE Benchmarks

Software engineering tests including code generation, debugging, and algorithm design to measure programming capabilities.

HumanEval+

Extended version of HumanEval with more complex programming challenges across multiple languages to test code quality.

MT-Bench Analysis

Multi-turn benchmarking that evaluates conversation abilities, reasoning, and instruction following across complex dialogues.

GSM8K Reasoning

Grade school math word problems requiring multi-step reasoning to evaluate logical thinking and problem-solving capabilities.

The best LLMs in the world, sorted by price, context size, and parameters.

Top LLM Models by MMLU Score

The top LLM for reasoning and problem solving, with a focus on grade school math word problems.

Fastest LLM Models by Throughput

The fastest LLMs ranked by tokens processed per second, measuring raw processing speed and efficiency.

Most Cost-Effective LLM Models

The most affordable LLMs ranked by cost per token, helping you optimize your budget without compromising quality.

Top LLM Models by Coding Ability

The best LLMs for coding tasks, ranked by their performance on the HumanEval benchmark.

Compare LLM Models

Compare any two LLM models side by side across different metrics, including MMLU, GPQA, HumanEval, DROP, Context Size, Parameters, Input Price, Output Price, Inference Speed, Throughput, and Latency.

Token Generation Speed

Observe how different processing speeds affect real-time token generation.
Try adjusting the speeds using the number inputs above each panel ↑

1200t/s

200t/s

40t/s

Values reset every 5 seconds to demonstrate different speeds

LLM Leaderboard

Knowing the best LLMs is key to building the best AI applications, and evaluating them is a daunting task.

MMLU Benchmarks

GPQA Evaluation

SWE Benchmarks

HumanEval+

MT-Bench Analysis

GSM8K Reasoning

All the models in one place, sorted by price, context size, and parameters.

The best LLMs in the world, sorted by price, context size, and parameters.

Top LLM Models by MMLU Score

Fastest LLM Models by Throughput

Most Cost-Effective LLM Models

Top LLM Models by Coding Ability

Compare LLM Models

Token Generation Speed

LLM Leaderboard

Knowing the best LLMs is key to building the best AI applications, and evaluating them is a daunting task.

MMLU Benchmarks

GPQA Evaluation

SWE Benchmarks

HumanEval+

MT-Bench Analysis

GSM8K Reasoning

All LLM Models Comparison Table

All the models in one place, sorted by price, context size, and parameters.

LLM Performance Leaderboard

The best LLMs in the world, sorted by price, context size, and parameters.

Top LLM Models by MMLU Score

Fastest LLM Models by Throughput

Most Cost-Effective LLM Models

Top LLM Models by Coding Ability

Compare LLM Models

Token Generation Speed