AI Agent2026-05-20

Elon Musk's $10 Billion Bet: Why Coding Agents Are the New Frontier in AI

The Unlikely Partnership: From Feud to Alliance

In a surprising turn of events, two of OpenAI's biggest rivals, Anthropic and Elon Musk, recently set aside their differences to form a strategic partnership. Their relationship had been anything but smooth. Earlier this year, Musk took to his X account to publicly denounce Anthropic as "woke," "evil," and "anti-humanity."

However, this outburst wasn't just Musk's typical volatility—it was triggered by a specific move from Anthropic that hit a nerve. It turns out that xAI, Musk's AI venture, had been using Cursor, a popular AI-powered coding tool. But at the start of the year, employees discovered that Claude models were no longer available on xAI's Cursor account. Wu Yuhuai, a co-founder of xAI at the time, circulated a letter to all staff stating: "Anthropic has updated its policies to prohibit Cursor from providing access to Claude models for major competitors."

In that letter, Wu wrote a line that now seems prescient:

"This is both bad news and good news. Productivity will drop, but it could also spark the development of our own coding products and models."

Why did xAI executives at the time believe developing proprietary coding products was so critical?

As we know, what followed was dramatic: the entire founding team at xAI resigned, and Musk leveraged his financial might against Cursor. Last month, SpaceX and Cursor announced an unprecedented strategic partnership focused on training AI models for programming and knowledge-base tasks. SpaceX also secured the right to acquire Cursor for $60 billion—or pay $10 billion as a collaboration fee.

Note the key qualifier: "programming." We'll return to this later.

The $10 Billion Question: What Data Is Cursor Sitting On?

Recently, I watched a video by Theo Browne, an early Cursor investor and founder of T3 who has been critical of Anthropic. I clicked expecting to see him dismiss the SpaceX-Anthropic moves, but instead found a nuanced—and highly logical—analysis of the SpaceX + Cursor deal.

Let's focus on the $10 billion collaboration fee rather than the $60 billion acquisition. In the video, Theo stated: "Even just for the exchange of Cursor's user data, $10 billion would be well worth it."

What kind of data is this? If you've seen Theo's video, you'll understand, but here's a quick summary.

Interactions with AI are a back-and-forth: you present a question or request, and you get an answer. Coding agents work the same way, except the output is code.

When you combine an entire high-quality interaction—the user's prompt, the model's reasoning, the agent's plan, the code output, and the validation—you get a complete Agentic Loop. This becomes high-value training data. Feeding it into a model for reinforcement learning can further boost its performance in real-world scenarios.

What SpaceX wants from Cursor is this data.

But where does this data come from? The answer is simple: for model vendors, the most direct source of such high-quality data is their own proprietary coding agent products—like Anthropic's Claude Code, OpenAI's Codex, or Kimi's Kimi Code.

Now it's clear why Wu Yuhuai proposed developing xAI's own coding products after the "account suspension" by Anthropic. At the time, xAI already realized that without proprietary coding products, they couldn't access high-quality reinforcement learning data, and without such data, they couldn't train coding models with true real-world capabilities.

It's a bold claim, but let's dive into the core argument: for model vendors to create programming models that actually work in practice, building their own coding agent products is the only path forward.

Why Process Data Beats Outcome Data in AI Training

Large language models are like crystal balls, trained on web-scale corpora. At first glance, they seem capable of answering any question, but that doesn't mean they provide high-quality answers to every query.

You can train a coding model on hundreds of millions of code entries from GitHub. This "outcome learning" logic is valid—whether code runs or tests pass is clear.

However, the process leading to outcomes is a complex chain involving multi-step decision-making, error correction, and intent alignment. User acceptances, rejections, refinements, cancellations, re-questions, and even the scolding when models fail or get things wrong—all these are process signals along this chain.

Reinforcement learning has two supervision methods. One is called outcome supervision, which only looks at whether the final result passes. But outcome supervision can lead to "reward hacking": models might write redundant, fragile, logically flawed code that still passes tests, making them think they've learned correctly.

The other is process supervision, which scores each step of the reasoning path. The process signals mentioned above are generated only in the execution environment of coding agents. GitHub repositories only have outcomes. Even looking at individual commit histories or PRs, you won't find effective process signals.

When effective, autonomously obtainable process signals are lacking, some model vendors resort to "distillation." Distillation logic is simple: for the same input, the student model learns what the teacher model outputs.

But what distillation yields is close to outcomes, not the chain of thought from the teacher model's internal probability distributions. If the student deviates from the teacher's trajectory during inference, even a single token mismatch can cause deviation.

This hits a fundamental limitation of reinforcement learning: the policy gradient theorem requires that optimization samples are best generated by the model being optimized itself—this data is called on-policy data. Using data from other products to train your model becomes off-policy data. Models can learn from it, but they can't learn the internal probability distribution information of the teacher model.

Companies like Cursor, which have their own coding agent products, hold the most realistic, effective, and high-quality training data. The Cursor product itself is the best training ground for coding models in a practical environment.

The Cursor Case Study: A Proof of Concept

APPSO readers might recall when Cursor launched Composer 2, touting it as "the next-gen dedicated programming model." The technical report was vague, with no specifics on the underlying model foundation.

Shortly after, code snippets with Kimi's model ID circulated in developer communities, forcing Cursor's VP Lee Robinson to clarify: "Composer 2 starts from an open-source foundation. About a quarter of the final model's compute comes from the foundation, with the rest trained in-house." Hours later, Cursor co-founder Aman Sanger added an apology: "It was a mistake not to mention the Kimi foundation upfront."

Five days later, Cursor released the full Composer 2 technical report, revealing the foundation was Kimi K2.5, licensed by Fireworks AI. The gist: training on K2.5, then continuing with large-scale reinforcement learning (RL).

Crucially, Composer 2's RL runs in actual Cursor sessions, using identical tools and harnesses in production. Cursor calls this "real-time RL"—deploying model checkpoints directly into Cursor's production environment, observing user responses, collecting data, and integrating reward signals. They iterate model versions as fast as every five hours, deploying to Cursor, and repeat.

The best example is Cursor's auto-complete feature, Tab, which handles over 400 million requests daily. Every time a user types a character or moves the cursor, the model predicts the next action. If confidence is high, it shows a suggestion; if the user hits Tab, they accept the auto-completion.

This feature uses online reinforcement learning, which is highly unique in the industry. Cursor updates Tab's capabilities to users at very high frequency (as fast as every 30 minutes to 2 hours), collecting on-policy data and training within the product.

This high-frequency, near-real-time feedback loop allows Tab to learn subtle user intents. Cursor revealed that this method reduced rejection rates for Tab suggestions by 21% and increased acceptance rates by 28%.

Regarding the Composer model itself, after the controversy, some Kimi employees deleted earlier tweets expressing frustration, and Kimi's official account offered congratulations.

With a $60 billion valuation (based on Musk's figure), a coding agent application company without its own model foundation still succeeded in data flow.

Thus, rather than saying Cursor failed, it's a perfect example highlighting the importance of coding agent products.

In another article on real-time RL, Cursor wrote: "The biggest challenge (in training programming models) is user modeling. Composer's production environment includes not just computers running, but people supervising and guiding them. Simulating computers is easy; simulating the people using them is hard."

This sentiment is gradually becoming consensus among cutting-edge model vendors in the programming model field. Looking at benchmarks like SWE-bench or LLM-Stats, you can see which major vendors are closest to users.

In authoritative rankings, models like Claude, GPT, Gemini, and Kimi—all from vendors that have developed their own coding agent products (including CLI, IDE, and desktop clients with coding agent integration)—dominate the top 10.

Some rankings show exceptions like Meta (Muse Spark) or DeepSeek, which haven't developed proprietary coding agents. However, in more realistic scenarios and benchmarks designed to avoid contamination, these exception models struggle to reach top rankings. For instance, DeepSeek scores 70 on SWE-bench bash only, ranking 9th, but only around 15% on SWE-bench Pro.

OpenRouter's actual traffic data explains this difference. According to the platform's 2025 report, over 80% of Claude's token consumption goes to programming and technical tasks, while most of DeepSeek's token usage focuses on chat and role-playing.

Vendors without proprietary coding products might top benchmarks for some coding tasks, but in harder real-world engineering benchmarks and actual traffic where users spend tokens, the true picture emerges.

The Strategic Shift: Coding Agents as the New High Ground

In the evolution of AI, the definition of production factors has changed dramatically. Traditional key elements—compute, research, and training data—are growing in overall volume but facing severe structural imbalances.

Today's major AI companies have significantly increased capital expenditure (CapEx) on compute, with infrastructure being a hot topic. However, in practice, especially in programming, public code data from sources like GitHub and StackOverflow has been utilized by foundational model vendors in a "drain the pond to get all the fish" manner, gradually clarifying the boundaries of models' code generation and logical reasoning capabilities.

This is why industry consensus is shifting toward a new strategic high ground.

For model vendors aiming to master top coding capabilities, building proprietary coding agent products is no longer an optional business route—it's a core lifeline for continuous foundational model evolution.

As argued earlier, learning only from public data means learning only from successful outcomes, not the path to success. This isn't true success science. In real programming environments, knowing what went wrong, how it went wrong, and how to understand and efficiently practice requirements—understanding the value of correct processes—is far more important than just achieving correct outcomes.

Only model vendors with proprietary coding products can obtain high-quality "process supervision" signals, maintaining a technical moat in the next phase of competition for coding/reasoning capabilities. Otherwise, like SpaceX, they'll have to pay coding agent product companies for collaboration. But not all model vendors are as wealthy as Musk, and starting in 2026, the division, alliances, and territorial disputes among giants will intensify. When vendors without proprietary coding products finally wake up, they might find insufficient partners and soaring collaboration costs.

The Industry Landscape: Everyone Is Building Coding Agents

The situation of US model companies is well-known. APPSO also noticed that most major domestic model vendors and AI giants are already deploying coding agent products.

Domestic giants are mainly adopting approaches with native AI IDEs or IDE plugins. ByteDance launched TRAE early last year, Alibaba has Qoder, Tencent has CodeBuddy, Baidu has Wenxin Kuaima Comate, and so on.

Among smaller AI companies, Moonshot AI was among the earliest to develop an independent coding agent product, Kimi Code, with a CLI interface.

Another approach is for model vendors to provide API services and coding plans themselves. In this case, regardless of what AI development environment users employ, vendors can obtain process data as close as possible to native coding products through server-side API logs.

But this is only an approximation, not identical. The core issue is that server-side API request-response logs still have a significant gap compared to deeply integrated product interaction trajectories.

Vendors that build their own products (like Cursor, Claude Desktop, Codex) have the most direct and clear feedback signals, while the API side relies on relatively vague, indirect guesses. Simply put, the API side can see user requests and responses, but they don't know if the user ultimately adopted the code, whether it was executable, or what bugs occurred. They can't understand the crucial label of user final actions, preventing them from achieving the highest-quality reinforcement learning.

To put it formally: language is the world, and code is the solution. Code can express almost all tasks in the world, making it the greatest amplifier, multiplying the productivity of top talent.

Only the best coding models deserve the best talent. If major model vendors don't prioritize coding, they'll drop out of the top tier.

Of course, in practice, not all model vendors ignore coding—rather, under the new paradigm, companies without proprietary native coding agent products are increasingly likely to fall behind those with them.

Recently, MiniMax announced a major update to its desktop client product, significantly improving support for coding tasks.

Shortly after, on May 15, Alibaba officially launched its product—upgrading from an IDE format to a full agent product (officially called an intelligent agent development workbench).

Simultaneously, xAI's Grok Build CLI was finally officially released.

Yes, this is the coding agent xAI built after being "account-suspended" by Anthropic and Cursor earlier this year.

More ready-to-use examples keep emerging. It seems everyone believes Cursor, Codex, and Claude Desktop are on the right track.

From Coding to General AI Agents: The Data Imperative

Expanding the discussion from coding to agents, the situation is the same. In public corpora, you can still find trajectory data for coding tasks (e.g., GitHub commit histories/PRs, though quality is low). But trajectory data for agent tasks (mouse movements and clicks, touchscreen operations, input field entries, etc.) cannot be found in public corpora.

Therefore, even for browser plugins, we see most model vendors building their own.

OpenAI released Operator in January 2025, which can be described as "AI-driven browser automation" but is essentially a large-scale data collection device. Every user who tries Operator is providing on-policy data to OpenAI for free.

OpenAI later spawned ChatGPT Agent and the new Codex desktop client. Anthropic did similarly, and recently Kimi quietly started a project called WebBridge, a browser plugin.

Even DeepSeek, a Chinese model giant that has been relatively quiet over the past two years, has recently shown interest in agents. CEO Liang Wenfeng said in a previous interview:

"Math and code are natural experimental grounds for AGI, like Go—a closed, verifiable system where high intelligence could be achieved through self-learning."

The implication was that DeepSeek has always treated coding and agents as research experimental grounds, not a commercialization direction.

However, this March, DeepSeek opened over ten agent-related job positions, recruiting for roles like model strategy product manager (agent direction) for the first time. Job requirements included "deeply using products like Anthropic's Claude Code and Manus."

APPSO noticed that DeepSeek has recently been recruiting for positions like agent product manager and harness product manager. Clearly, DeepSeek wants to build independent, native coding/agent products.

According to previous materials, during the training of DeepSeek V3.2, about 2,000 synthetic agent training environments and over 80,000 complex instructions were introduced. But synthetic data alone can only take DeepSeek so far. What synthetic data can't provide is the real successes and failures of real users in real environments—something only obtainable through having proprietary agent products.

DeepSeek has developed models and products in a very restrained manner for three years. But today, it's becoming difficult to achieve SOTA in coding tasks, and even if achieved, it's quickly surpassed.

As the research-dependent approach proves unsustainable, DeepSeek has taken action.

Conclusion: The Blurring Line Between Models and Products

Finally, let's return to the opening story. According to The Information, upon receiving Musk's $60 billion acquisition/$10 billion collaboration offer, Cursor reportedly said it would not collaborate on developing new models with xAI, focusing instead on optimizing its own Composer model.

This suggests that even if acquired by Musk, Cursor wants to maintain sovereignty over data flow.

Data ownership is the most critical hidden bargaining point.

As all top model vendors build their own products and all top products train their own models, the boundary between "model companies" and "product companies" is increasingly blurred.

This game has only just begun.

We are looking for collaborators! 📮 Submit resumes to hr@ifanr.com ✉️ Email subject: "Name + Position" (please attach resumes and related projects/works or links)

Comments (0)

Share:X Hatena

Back to Blog