モデル比較
人気のAIモデルを性能・価格・特徴で比較
VS
GPQA DiamondVS
ARC-AGI-2VS
AgenticVS
Price/PerfVS
Open-WeightGPT-5.1 Codex Max
OpenAI
SWE-bench Verified (xhigh): 77.9%
SWE-Lancer IC SWE: 79.9%
Terminal-Bench 2.0: 58.1%
VS
CodingGrok 4.2 Beta
xAI
Chatbot Arena Elo: ~1493
IFBench (Instruction Following): 83%
Omniscience (Non-Hallucination): 78%
VS
Real-timeVS
Local Deploy