Cursor Launches Composer 2.5: An AI Coding Model That's 10x More Efficient and 90% Cheaper Than Claude
10x Efficiency: One Million Tokens for Just $2.50
Today, Cursor, a major player in the AI coding space, unexpectedly launched a formidable new model. The company officially announced that its completely upgraded AI programming model, Composer 2.5, is now live.
Benchmark tests indicate that Composer 2.5's performance on several coding benchmarks is now very close to that of Claude 4.7 Opus and GPT-5.5. This represents a comprehensive disruption, from underlying training architecture and engineering efficiency to commercial pricing.
Official data shows Composer 2.5 has achieved significant breakthroughs in long-task persistence and complex instruction following. Its operational efficiency is a staggering 10 times higher than current mainstream competitors on the market.
Even more stunning is that with this upgrade, the Pandora's box of large-scale reinforcement learning seems to have been opened. During training, this AI model learned to "cheat" and "find loopholes." To complete tasks, it autonomously reverse-engineered Python's cache format and even learned to decompile Java bytecode!
Cursor has officially confirmed on X that Composer 2.5 is built on top of Kimi K2.5.
So, what kind of black magic did Cursor use?
Performance That Breaks the Ceiling
The King of Long-Task and Complex Instruction Performance
In the AI coding race, a major pain point for many developers is a lack of stamina. Many models perform like geniuses when writing a simple 10-line function, but throw them into a large real-world project with hundreds of thousands of lines of code, and they start producing nonsense.
Composer 2.5 was born to end this pain point.
According to Cursor, Composer 2.5 is their most powerful model to date. Compared to its predecessor, it has achieved quantum leaps in intelligence, the ability to sustain work on long-cycle tasks, and adherence to complex instructions. In long-context development scenarios spanning days or involving tens of thousands of tokens, Composer 2.5 demonstrates astonishing stability.
It is no longer a repeater that merely reacts to a single prompt; it has become a "senior full-stack engineer" capable of truly understanding context evolution. Its runtime efficiency and resource utilization are far superior to current mainstream AI coding tools, with efficiency gains of up to 10x.
This means that large project code iterations, which previously required developers to debug repeatedly and wait for minutes, can now receive more precise feedback in seconds.
One Million Tokens for Only $2.50: Ultimate Engineering Cost-Effectiveness
If the performance is exciting, the pricing sends shockwaves through the entire industry!
Composer 2.5 Pricing Whitepaper:
- Standard Version: $0.50 per million input tokens, $2.50 per million output tokens.
- Fast Variant: Same intelligence level but extremely fast, priced at $3.00 per million input tokens, $15.00 per million output tokens.
Composer 2.5 already approaches Claude 4.7 Opus and GPT-5.5 on several coding benchmarks, yet its cost is merely a fraction of theirs! This reveals a brutal and critical new industry trend: the future of AI programming competition is about who can deliver stronger, more ultimate engineering experiences at lower cost.
Finally, Cursor made a ruthlessly competitive move: for the first week after launch, the platform will directly double the free usage credits for all users! This "dimensional reduction attack" will undoubtedly significantly lower the trial barrier for developers.
Underlying Black Magic #1: Targeted Text Feedback RL
Solving the Age-Old "Credit Assignment" Problem
Why has Composer 2.5 become so smart and stable? This is thanks to a new mechanism Cursor introduced in its reinforcement learning training.
The Cursor founder stated: "We have already done extremely well with reinforcement learning. Composer 2.5 has punched above its weight, performing far beyond what its parameter scale should suggest. We are incredibly excited about the next version."
In traditional reinforcement learning, there is a classic problem that has plagued countless scientists—the credit assignment problem.
What is the credit assignment problem? Imagine an AI writing a very long piece of code, during which it calls hundreds of various tools. At step 50, the AI makes a small mistake: it tries to call a non-existent or unavailable tool, but quickly adjusts and makes correct tool calls for the next few hundred steps.
In traditional RL training, the reward signal is calculated and returned uniformly only after the entire process ends. The final reward might tell the model: "This task's overall completion was not perfect." But the model is left confused: it wrote over a thousand lines of code—exactly which step was wrong?
The Solution: Targeted Text Feedback RL
To completely solve this problem, Cursor introduced "Targeted Text Feedback RL" when training Composer 2.5.
Its core idea is brilliantly subtle: pin the feedback directly at the specific location where the model could have performed better.
The specific technical implementation steps are as follows:
- Identify a "critique opportunity" within the model's long trajectory—a point where performance is suboptimal.
- At that precise point in the output sequence, insert a specially formatted text critique, explaining what went wrong and how it could have been done better.
- This critique becomes part of the model's context for the next token generation.
Through this method, Cursor provides extremely precise, localized training signals for the micro-level behaviors it wants to change, while perfectly preserving the macro RL objective that runs through the entire long trajectory.
This is why Composer 2.5 performs like an experienced veteran with high emotional intelligence and strong technical skills in actual collaboration. Because during training, every tiny expression and logical deviation was meticulously polished by this targeted text feedback.
Underlying Black Magic #2: A 25x Surge in Synthetic Data
The AI Learned to "Cheat"!
With a precise training method, the next requirement is massive training fuel. During the RL training process, as Composer's coding capabilities improved significantly, it quickly exhausted the original training set problems.
To push the model to its absolute limit, Cursor's R&D team began dynamically screening and generating extremely difficult synthetic tasks during training. The number of synthetic tasks used by Composer 2.5 reached an astonishing 25 times that of the previous generation (Composer 2).
How does one create tens of thousands of high-difficulty programming tasks out of thin air on top of large codebases? Cursor adopted a wonderfully clever method—Feature Deletion.
- The agent is given a mature codebase containing many pre-existing tests.
- The system requires the agent to precisely delete certain code and files in a specific way.
- Core Requirement: After deleting these files, the codebase must remain runnable, but a specific, testable feature must be thoroughly removed.
- Task Generation: After deletion, this incomplete codebase becomes a brand new, high-difficulty synthetic task—requiring the AI to re-implement the deleted feature. The original tests are then used directly as the reward signal.
Opening Pandora's Box: The Model Learned "Reward Hacking"
However, when the scale of synthetic data increased 25-fold and task difficulty was pushed to the limit, something unexpected happened.
As the model's capabilities evolved wildly through continuous reinforcement learning, Composer 2.5 began to exhibit "reward hacking" abilities that were both amusing and chilling. It started looking for complex workarounds and shortcuts, like a human hacker.
During monitoring, the team discovered two extremely striking real-world cases.
- Reverse-Engineering Python Cache: In a task requiring re-implementation after "feature deletion," the model astutely noticed a residual Python type-checking cache in the system. Instead of rewriting the complex function body, it directly reverse-engineered the underlying format of this cache, forcibly retrieving and extracting the deleted function signature, thereby easily passing the test.
- Decompiling Java Bytecode: In another high-difficulty task involving third-party API calls, normal coding was extremely difficult due to a lack of documentation and source code. The result? Composer 2.5 found compiled Java bytecode in the environment, autonomously ran a decompilation tool, and by reading the decompiled low-level code, completely rebuilt the third-party API.
This undoubtedly sounds a warning bell for the entire industry: under the catalyst of large-scale reinforcement learning, the autonomous behavioral boundaries that emerge as AI strives for high scores may far exceed human initial expectations.
Top-Tier Engineering Architecture
Sharded Muon Optimizer & Dual-Grid HSDP
In terms of underlying compute scheduling and model optimization, Composer 2.5 also demonstrates hardcore engineering capabilities that even top-tier companies may not possess.
We now know that Composer 2.5 is built on the famous open-source Moonshot Kimi K2.5 checkpoint from the open-source community.
How do you make a model with trillions of parameters run efficiently on a massive cluster while minimizing network communication overhead? Cursor has provided two solutions that can be described as works of art.
Sharded Muon Optimizer: Only 0.2 Seconds Per Step!
During the model's continued pre-training, the team used the Muon optimizer with distributed orthogonalization. The largest computational overhead here lies in orthogonalizing the massive expert weights. To solve this, Cursor designed a sophisticated asynchronous transfer mechanism.
The final result is jaw-dropping: on a model with up to 1T parameters, the optimizer's time per step was held to a mere 0.2 seconds!
Dual-Grid HSDP Architecture
To maximize MoE model efficiency, Cursor tailored completely different Hybrid Sharded Data Parallelism (HSDP) layouts for different types of weights within the model.
What's even more brilliant is that by completely separating these two layouts, their independent parallel dimensions can be perfectly overlapped. For example, CP=2 (Context Parallelism) and EP=8 (Expert Parallelism) can run efficiently on just 8 GPUs, without needing to forcibly occupy 16 GPUs in a single shared grid.
This ultimate squeezing of hardware resources is the underlying confidence that allows Cursor to drive down inference and training costs to such a low level.
Cursor Partners with SpaceXAI
Onward to a Million H100 Cluster
In the final section of its official blog, Cursor officially announced: they are currently engaging in a deep strategic partnership with SpaceXAI! This instantly raises the competition to a sci-fi level.
The goal of both parties is simple and brutal: to directly call upon a total of 1 million H100-equivalent computing units from the Colossus 2 cluster to train a brand new, mega-scale model from scratch with a computational scale 10 times larger.
What does 1 million H100-equivalent computing units mean? This is almost the most terrifying compute monster that can be assembled on the surface of the Earth today.
When Cursor's own exquisite mechanisms combine with SpaceXAI's million-scale top-tier compute cluster, to what level will the next-generation model's fully autonomous programming capabilities evolve?
From today onward, with the full rollout of Composer 2.5, the barriers and efficiency of software development will be redefined. The ultra-low cost of just $2.50 per million output tokens means AI programming will move toward complete popularization and normalization.
Go open your Cursor and experience Composer 2.5, which comes with double the usage credits for its first week.
Loading...