Back to Leaderboard

Comprehensive Ranking

Overall AI model ranking across HLE, ARC-AGI-2, FrontierMath, SWE-bench Verified, and τ²-Bench.

698 models

#ModelDeveloperOpen Source
1Claude Mythos PreviewAnthropic64.793.9Closed
2GPT-5.4 ProOpenAI58.783.338.0Closed
3Muse SparkMeta AI58.042.514.677.4Closed
4GPT-5.5 ProOpenAI57.284.639.6Closed
5Opus 4.7Anthropic54.775.822.987.6Closed
6Kimi K2.6Moonshot AI54.080.2Closed
7Qwen3.7-Max-Previewアリババ53.580.4Closed
8Claude Opus 4.6Anthropic53.066.322.980.891.9Closed
9GLM 5.1Zhipu AI52.3Closed
10GPT-5.5OpenAI52.285.035.4Closed
11GPT-5.4OpenAI52.177.127.1Closed
12Gemini 3.1 Pro PreviewGoogle DeepMind51.477.116.780.690.8Closed
13Kimi K2 ThinkingMoonshot AI51.071.3Closed
14Qwen 3.6 Plus Previewアリババ50.678.8Closed
15GLM-5Zhipu AI50.44.92.177.889.7Closed
16Kimi K2.5Moonshot AI50.211.84.276.8Closed
17Qwen3.6-Max-Previewアリババ50.278.8Closed
18GPT-5.2 ProOpenAI50.054.231.3Closed
19Qwen3-Max-Thinkingアリババ49.875.382.1Closed
20Claude Sonnet 4.6Anthropic49.058.38.379.6Closed
21Qwen3.5-27Bアリババ48.572.479.0Closed
22Gemini 3 Deep Think - 2620Google DeepMind48.484.6Closed
23Qwen3.5-397B-A17Bアリババ48.376.486.7Closed
24DeepSeek-V4-ProDeepSeek48.280.6Closed
25Gemini 3.0 Pro (Preview 11-2025)Google DeepMind45.845.118.876.285.4Closed
26GPT-5.2OpenAI45.554.218.880.082.0Closed
27DeepSeek-V4-FlashDeepSeek45.179.0Closed
28Grok 4 HeavyxAI44.42.173.5Closed
29Gemini 3.0 FlashGoogle DeepMind43.533.64.268.790.2Closed
30Opus 4.5Anthropic43.237.64.280.982.0Closed

About Benchmarks

HLE
総合知能テスト — 人間レベルの推論能力を測定
ARC-AGI-2
抽象的推論ベンチマーク — 新規パターンの汎化能力を測定
FrontierMath - Tier 4
高度な数学問題 — 研究レベルの数学的推論能力を測定
SWE-bench Verified
実践的ソフトウェア開発タスク — 実際のバグ修正能力を測定
τ²-Bench
自律エージェントタスク — ツール呼び出しと推論の組み合わせ能力を測定