A comprehensive list of the best LLMs in the world, ranked by their performance, price, and features, updated daily.
Comprehensive testing across 57 subjects including mathematics, history, law, and medicine to evaluate LLM knowledge breadth.
Graduate-level expert knowledge evaluation designed to test advanced reasoning in specialized domains.
Software engineering tests including code generation, debugging, and algorithm design to measure programming capabilities.
Extended version of HumanEval with more complex programming challenges across multiple languages to test code quality.
Multi-turn benchmarking that evaluates conversation abilities, reasoning, and instruction following across complex dialogues.
Grade school math word problems requiring multi-step reasoning to evaluate logical thinking and problem-solving capabilities.
The top LLM for reasoning and problem solving, with a focus on grade school math word problems.
The fastest LLMs ranked by tokens processed per second, measuring raw processing speed and efficiency.
The most affordable LLMs ranked by cost per token, helping you optimize your budget without compromising quality.
The best LLMs for coding tasks, ranked by their performance on the HumanEval benchmark.
Compare any two LLM models side by side across different metrics, including MMLU, GPQA, HumanEval, DROP, Context Size, Parameters, Input Price, Output Price, Inference Speed, Throughput, and Latency.
Observe how different processing speeds affect real-time token generation.
Try adjusting the speeds using the number inputs above each panel ↑