AI Model Rankings

Comprehensive AI model rankings across 17 benchmarks. Detailed comparisons by category.

Comprehensive Ranking

Overall AI model ranking across HLE, ARC-AGI-2, FrontierMath, SWE-bench Verified, and τ²-Bench.

5 benchmarks

Coding Capability

Programming ability benchmarks: SWE-bench Verified, LiveCodeBench, SWE-bench Pro, Aider-Polyglot.

4 benchmarks

Math Capability

Mathematical reasoning benchmarks: AIME 2025/2026, FrontierMath, MATH-500, GSM8K.

5 benchmarks

AI Agent Capability

Autonomous agent benchmarks: τ²-Bench, Terminal Bench Hard, Aider-Polyglot.

3 benchmarks

Reasoning Capability

Reasoning and thinking benchmarks: HLE, ARC-AGI-2, GPQA Diamond.

3 benchmarks

General Performance

General AI performance: MMLU-Pro, LMArena Elo ratings.

2 benchmarks

OpenClaw Ranking

OpenClaw agent performance: Claw Bench and Pinch Bench.

2 benchmarks

Comprehensive Ranking

Overall scores across HLE, ARC-AGI-2, FrontierMath, SWE-bench, and τ²-Bench

786 models

#	Model	Developer						Open Source
1	Claude Mythos Preview	Anthropic	64.7	—	—	93.9	—	Closed
2	Claude Fable 5	Anthropic	59.0	—	—	95.0	—	Closed
3	GPT-5.4 Pro	OpenAI	58.7	83.3	38.0	—	—	Closed
4	Muse Spark	Meta AI	58.0	42.5	14.6	77.4	—	Closed
5	Claude Opus 4.8	Anthropic	57.9	—	—	88.6	—	Closed
6	Claude Sonnet 5	Anthropic	57.4	—	—	85.2	—	Closed
7	GPT-5.5 Pro	OpenAI	57.2	84.6	39.6	—	—	Closed
8	GLM-5.2	Zhipu AI	54.7	—	—	—	—	Closed
9	Opus 4.7	Anthropic	54.7	75.8	22.9	87.6	—	Closed
10	Kimi K2.6	Moonshot AI	54.0	—	—	80.2	—	Closed
11	Qwen3.7-Max-Preview	アリババ	53.5	—	—	80.4	—	Closed
12	Claude Opus 4.6	Anthropic	53.0	66.3	22.9	80.8	91.9	Closed
13	GLM 5.1	Zhipu AI	52.3	—	—	—	—	Closed
14	GPT-5.5	OpenAI	52.2	85.0	35.4	—	—	Closed
15	GPT-5.4	OpenAI	52.1	77.1	27.1	—	—	Closed
16	Gemini 3.1 Pro Preview	Google DeepMind	51.4	77.1	16.7	80.6	90.8	Closed
17	Kimi K2 Thinking	Moonshot AI	51.0	—	—	71.3	—	Closed
18	Qwen 3.6 Plus Preview	アリババ	50.6	—	—	78.8	—	Closed
19	GLM-5	Zhipu AI	50.4	4.9	2.1	77.8	89.7	Closed
20	Kimi K2.5	Moonshot AI	50.2	11.8	4.2	76.8	—	Closed
21	Qwen3.6-Max-Preview	アリババ	50.2	—	—	78.8	—	Closed
22	GPT-5.2 Pro	OpenAI	50.0	54.2	31.3	—	—	Closed
23	Qwen3-Max-Thinking	アリババ	49.8	—	—	75.3	82.1	Closed
24	Claude Sonnet 4.6	Anthropic	49.0	58.3	8.3	79.6	—	Closed
25	Qwen3.5-27B	アリババ	48.5	—	—	72.4	79.0	Closed
26	Gemini 3 Deep Think - 2620	Google DeepMind	48.4	84.6	—	—	—	Closed
27	Qwen3.5-397B-A17B	アリババ	48.3	—	—	76.4	86.7	Closed
28	DeepSeek-V4-Pro	DeepSeek	48.2	—	—	80.6	—	Closed
29	Gemini 3.0 Pro (Preview 11-2025)	Google DeepMind	45.8	45.1	18.8	76.2	85.4	Closed
30	GPT-5.2	OpenAI	45.5	54.2	18.8	80.0	82.0	Closed