모델 목록으로
OpenAI독점

GPT-5.1 Codex Max

The top-tier version of OpenAI's coding-specialized model. It recorded 68.2 on SWE-bench Verified, demonstrating top-class performance in practical software development tasks.

파라미터

Undisclosed

컨텍스트

256K

라이선스

Proprietary

출시일

2026-02-10

일본어 처리 능력

High-Quality JP

Multilingual model with strong Japanese language processing capabilities.

API 가격

입력 가격 (1M 토큰당)

$2.5

출력 가격 (1M 토큰당)

$15

과금 모드: standard

강점

  • Top-tier coding performance
  • Leading scores on SWE-bench Verified
  • 256K context for large codebases
  • 50% discount with Batch API

약점

  • Inferior to GPT-5.2 for general text generation
  • Not ideal for non-coding tasks
  • Slightly high cost

활용 사례

  • Large-scale code generation
  • Refactoring assistance
  • Multi-file debugging
  • Integration into CI/CD pipelines

심층 분석

SWE-bench Verified (xhigh)

77.9%

vs Claude Opus 4.5: 80.9%

SWE-Lancer IC SWE

79.9%

Significant improvement over GPT-5.1-Codex

Terminal-Bench 2.0

58.1%

vs Gemini 3 Pro: 54.2%

Input Price

$1.25/1M

Cached: $0.625/1M

Output Price

$10/1M

Premium tier for high-quality output

Context Window

400K

Unlimited via compaction

Arena Elo

1349

#27 overall (BenchLM provisional)

강점

  • Enables 24+ hour autonomous coding sessions via context compaction technology
  • 30% more token-efficient than predecessor at same reasoning effort
  • Best-in-class for long-horizon software engineering tasks and repository-scale refactoring

약점

  • High output token cost ($10/1M) can accumulate quickly in automated pipelines
  • Not optimized for creative writing, marketing copy, or non-coding tasks
  • Compaction may cause nuanced detail loss over extremely long sessions

경쟁사 비교

ModelArenaSWEGPQAPrice
GPT-5.1-Codex-Max134977.9%N/A$1.25/$10.00
Claude Opus 4.5N/A80.9%N/A$17/month+
Gemini 3 ProN/A76.2%N/AN/A

GPT-5.1-Codex-Max represents OpenAI's specialized frontier for autonomous software engineering. Released November 19, 2025, it builds upon the GPT-5.1 foundation with specific training for agentic coding tasks. The model's defining innovation is 'context compaction'—a native training process that allows coherent operation across multiple context windows, enabling sustained work over millions of tokens for hours or even days. This moves beyond simple code completion to true autonomous development workflows.

Positioned as the default model in OpenAI's Codex ecosystem (CLI, IDE extensions, cloud), Codex-Max targets professional developers and engineering teams needing to handle project-scale refactors, deep debugging sessions, and long-running agent loops. While it achieves strong benchmark scores, its real value lies in operational longevity and token efficiency—it uses 30% fewer thinking tokens than its predecessor at equivalent performance. The model is explicitly not a general-purpose chatbot; it's engineered for Codex-like environments and excels when paired with development tools.

The competitive landscape shows Codex-Max trailing slightly behind Anthropic's Claude Opus 4.5 on SWE-bench Verified (77.9% vs 80.9%) but leading on other coding evaluations. Its true differentiator against competitors like Google's Gemini 3 Pro is the combination of long-horizon autonomy, native Windows support, and integration with OpenAI's developer ecosystem. Pricing reflects its premium positioning, with output costs at $10/1M tokens—significantly higher than general-purpose models but justified for high-value software engineering work.

분석 생성일: 2026-05-23