Back to Leaderboard

Math Capability

Mathematical reasoning benchmarks: AIME 2025/2026, FrontierMath, MATH-500, GSM8K.

698 models

#ModelDeveloperOpen Source
1Step 3.5 FlashStepFun97.3Closed
2DeepSeek V3.2 SpecialeDeepSeek96.0Closed
3DeepSeek V3.2DeepSeek93.192.72.1Closed
4o3-proOpenAI93.0Closed
5Qwen3-235B-A22B-Thinkingアリババ92.3Open
6Grok 4 FastxAI92.0Closed
7GLM-4.7-FlashZhipu AI91.6Closed
8Grok 4.1 FastxAI89.0Closed
9DeepSeek-R1-0528DeepSeek87.598.0Closed
10MiniMax M2.5MiniMax86.3Closed
11Intern-S1上海人工知能研究所86.0Open
12Gemini-2.5-Pro-Preview-05-06Google DeepMind83.02.198.8Closed
13GPT OSS 120BOpenAI83.0Closed
14Step3StepFun82.9Open
15Qwen3-4B-Thinking-2507アリババ81.3Open
16M2.1MiniMax81.0Closed
17Qwen3 Max (Preview)アリババ80.6Closed
18GPT OSS 20BOpenAI79.0Closed
19MiniMax M2MiniMax78.0Closed
20MiniMax-M1-80kMiniMax76.996.8Closed
21Hunyuan-A13B-InstructテンセントAI研究所76.891.8Closed
22Hunyuan-7Bテンセント75.393.7Closed
23Kimi K2 0905Moonshot AI75.2Closed
24MiniMax-M1-40kMiniMax74.696.0Closed
25Qwen3-235B-A22B-2507アリババ70.3Closed
26DeepSeek-R1DeepSeek70.097.3Closed
27Qwen3-Nextアリババ69.590.3Closed
28Pangu Pro MoEファーウェイ68.196.8Closed
29Magistral-Medium-2506Mistral65.0Closed
30Gemini 2.5 Flash-LiteGoogle DeepMind63.1Closed

About Benchmarks

AIME 2025
American Invitational Mathematics Examination 2025 — 高校生レベルの数学コンテスト
AIME 2026
American Invitational Mathematics Examination 2026 — 高校生レベルの数学コンテスト
FrontierMath - Tier 4
高度な数学問題 — 研究レベルの数学的推論能力を測定
MATH-500
数学問題セット — 幅広い数学分野の問題解決能力を測定
GSM8K
Grade School Math 8K — 小学校レベルの数学的推論能力を測定