모델 목록으로
Cursor독점

Composer 2.5

Composer 2.5 is a programming-focused foundation model developed by Cursor. Equipped with an extensive 200K context window, it enables advanced code generation and understanding.

파라미터

Undisclosed

컨텍스트

200K

라이선스

Proprietary

출시일

2026-05-18

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

  • Specialized for programming
  • Long 200K context understanding
  • Efficient code generation capabilities

약점

  • Non-open-source license
  • Closed model with usage restrictions
  • Limited adaptability to general tasks

활용 사례

  • Analysis of large codebases
  • Automatic generation of complex programs
  • Advanced code refactoring

심층 분석

Coding Agent Index

62

#3 overall, behind Opus 4.7 (66) and GPT-5.5 (65)

SWE-Bench Verified

80.8%

vs GPT-5.5: 82.6%, vs Opus 4.7: ~80.5%

Per-Task Cost (Standard)

$0.07

~60x cheaper than GPT-5.5 xhigh ($4.82)

Per-Task Cost (Fast)

$0.44

~10x cheaper than frontier rivals

Context Window

200K–1M tokens

IDE-native with 200K practical; 1M theoretical max

Avg Wall Time (Fast)

6.7 min/task

3rd fastest agent on Coding Agent Index

강점

  • Best cost-to-quality ratio among coding agents scoring above 60 on the Coding Agent Index — under $1 per task at standard tier
  • Massive benchmark gains over Composer 2: +35 points on SWE-Bench-Pro-Hard-AA, near-parity with Opus 4.7 on SWE-Bench Multilingual
  • Purpose-built for agentic IDE workflows with improved long-running task reliability and effort calibration via targeted RL

약점

  • Exclusively locked to Cursor IDE/CLI — no public API, no multi-provider portability, significant vendor lock-in risk
  • Terminal-Bench 2.0 trails GPT-5.5 by 13 points (69.3% vs 82.7%), underperforming on shell-driven autonomous workflows
  • Early reports of agent-mode inconsistency — switching to 'ask mode' mid-task and forgetting pipeline context on long rollouts

경쟁사 비교

ModelArenaSWEGPQAPrice
Claude Opus 4.766 (Index)80.5%~64.8% (CursorBench max)$5/$25 per M tokens
GPT-5.5 (xhigh)65 (Index)82.6% (Verified)82.7% (Terminal-Bench)$5–$10/$30–$45 per M tokens
Composer 2 (predecessor)48 (Index)73.7% (Multilingual)61.7% (Terminal-Bench)$0.50/$2.50 per M tokens

Composer 2.5 is Cursor's latest purpose-built coding agent model, released May 2026, representing a significant leap from its predecessor Composer 2. Built on Moonshot AI's open-weight Kimi K2.5 foundation with approximately 85% of total compute dedicated to Cursor's own post-training and reinforcement learning, Composer 2.5 achieves near-frontier coding performance at a fraction of the cost. It ranks third on the Artificial Analysis Coding Agent Index with a score of 62, trailing only max-effort configurations of Claude Opus 4.7 (66) and GPT-5.5 (65) that cost 10–60× more per task.

The model's core innovation lies in its training methodology: targeted RL with textual feedback inserts localized hints at the exact point of error in long agent rollouts, solving the credit-assignment problem that plagues traditional reinforcement learning over hundreds of thousands of tokens. Combined with 25× more synthetic tasks (including novel 'feature deletion' exercises where the model must reimplement stripped functionality), Composer 2.5 achieves a dramatic 35-point gain on SWE-Bench-Pro-Hard-AA (12% → 47%) and near-parity with Opus 4.7 on SWE-Bench Multilingual (79.8% vs 80.5%).

Positioned as the dominant cost-efficient option for teams running high-volume coding agent workloads, Composer 2.5's standard tier at $0.50/$2.50 per million input/output tokens makes it roughly 14× cheaper than Claude Opus 4.7 and up to 60× cheaper than GPT-5.5's highest-effort tier. However, its exclusivity to the Cursor ecosystem — with no public API and no multi-provider availability — represents a significant architectural constraint. Cursor has also announced a partnership with SpaceXAI to train a substantially larger model using 10× more compute, signaling that Composer 2.5 is an intermediate step in a broader capability trajectory.

분석 생성일: 2026-05-23