Model Comparison

Compare popular AI models by performance, price, and features

GPT-5.2

OpenAI

Arena Elo (Text): 1436

GPQA Diamond: 92.4%

SWE-Bench Verified: 80.0%

VS
GPQA Diamond
Claude Opus 4.7

Anthropic

GDPval-AA Elo: 1,753

SWE-bench Verified: 87.6%

SWE-bench Pro: 64.3%

GPT-5.2

OpenAI

Arena Elo (Text): 1436

GPQA Diamond: 92.4%

SWE-Bench Verified: 80.0%

VS
ARC-AGI-2
Gemini 3.0 Pro

Google DeepMind

Arena Elo: 1486

GPQA Diamond: 91.9%

SWE-Bench Verified: 76.2%

Claude Opus 4.7

Anthropic

GDPval-AA Elo: 1,753

SWE-bench Verified: 87.6%

SWE-bench Pro: 64.3%

VS
Agentic
Gemini 3.0 Pro

Google DeepMind

Arena Elo: 1486

GPQA Diamond: 91.9%

SWE-Bench Verified: 76.2%

DeepSeek V3.2

DeepSeek

Arena Elo: 1485

SWE-Bench Verified: 80.8%

Input Price: $0.28/1M tokens

VS
Price/Perf
GPT-5.2

OpenAI

Arena Elo (Text): 1436

GPQA Diamond: 92.4%

SWE-Bench Verified: 80.0%

Qwen3.6-27B

Alibaba

SWE-bench Verified: 77.2%

AIME 2026: 94.1%

Input Price: $0.60 / 1M tokens

VS
Open-Weight
Llama-3-Namazu-405B

Sakana AI

MMLU (5-shot): 88.6%

HumanEval (Coding): 89.0%

Input Price: $2.40/M tokens

GPT-5.1 Codex Max

OpenAI

SWE-bench Verified (xhigh): 77.9%

SWE-Lancer IC SWE: 79.9%

Terminal-Bench 2.0: 58.1%

VS
Coding
Claude Opus 4.7

Anthropic

GDPval-AA Elo: 1,753

SWE-bench Verified: 87.6%

SWE-bench Pro: 64.3%

Grok 4.2 Beta

xAI

Chatbot Arena Elo: ~1493

IFBench (Instruction Following): 83%

Omniscience (Non-Hallucination): 78%

VS
Real-time
GPT-5.2

OpenAI

Arena Elo (Text): 1436

GPQA Diamond: 92.4%

SWE-Bench Verified: 80.0%

Gemma 4 31B

Google DeepMind

Arena Elo: 1451

GPQA Diamond: 84.3%

LiveCodeBench v6: 80.0%

VS
Local Deploy
Qwen3.6-27B

Alibaba

SWE-bench Verified: 77.2%

AIME 2026: 94.1%

Input Price: $0.60 / 1M tokens