Back to Leaderboard

General Performance

General AI performance: MMLU-Pro, LMArena Elo ratings.

698 models

#ModelDeveloperOpen Source
1OpenAI o1OpenAI91.01.0Closed
2Gemini 3.0 Pro (Preview 11-2025)Google DeepMind90.01.0Closed
3Opus 4.5Anthropic90.0Closed
4Qwen3.7-Max-Previewアリババ89.6Closed
5Qwen 3.6 Plus Previewアリババ88.51.0Closed
6Qwen3.6-Max-Previewアリババ88.51.0Closed
7Claude Sonnet 4.5Anthropic88.01.0Closed
8M2.1MiniMax88.01.0Closed
9Opus 4.1Anthropic88.01.0Closed
10Qwen3.5-397B-A17Bアリババ87.81.0Closed
11Hunyuan-T1テンセントAI研究所87.21.0Closed
12DeepSeek-V4-ProDeepSeek87.11.0Closed
13Grok 4xAI87.01.0Closed
14DeepSeek-V4-FlashDeepSeek86.21.0Closed
15Qwen3.6-27Bアリババ86.2Closed
16Qwen3.5-27Bアリババ86.11.0Closed
17GPT-4.5OpenAI86.11.0Closed
18Gemini 2.5-ProGoogle DeepMind86.0Closed
19Qwen3-Max-Thinkingアリババ85.7Closed
20OpenAI o3OpenAI85.61.0Closed
21Gemma 4 31BGoogle DeepMind85.21.0Closed
22Qwen3.6-35B-A3Bアリババ85.2Closed
23DeepSeek-V3.1 TerminusDeepSeek85.01.0Closed
24DeepSeek V3.2-ExpDeepSeek85.01.0Closed
25DeepSeek-R1-0528DeepSeek85.01.0Closed
26Grok 4.1 FastxAI85.0Closed
27DeepSeek-V3.1DeepSeek85.01.0Closed
28Claude Opus 4Anthropic85.01.0Closed
29GLM-4.5Zhipu AI84.61.0Closed
30Claude Mythos PreviewAnthropicClosed

About Benchmarks

MMLU-Pro
Massive Multitask Language Understanding Pro — 幅広い知識分野の理解能力を測定
LMArena Elo
LMArena(旧Chatbot Arena)のEloレーティング — ユーザー匿名盲テストによる総合評価