AI Agent2026-05-17

The AI Agent Wars: From Toys to Tools as the Next-Gen Interaction Standard is Being Defined

As recently as two months ago, AI Agents were characterized by persistent limitations:

They could write scripts but instantly forgot the content upon completion
Complex task requests were met with demands for "more context"
Every conversation felt like a first meeting, requiring users to re-explain requirements

The situation has transformed dramatically.

Hermes Agent has amassed 154K GitHub stars, enabling 24/7 autonomous task execution with a three-tier memory system that allows skills to self-evolve. OpenAI's Codex can ingest entire codebases and solve bugs in 30 minutes that take humans 2.5 hours to fix. Anthropic has launched 10 pre-built agents for the financial services sector, covering everything from business plans to credit memos—scenes with extremely high commercial value.

AI agents are no longer a "future story"; we're in the midst of a real battle for dominance.

Open-source projects, big tech, and startups are all competing fiercely across three key domains: memory modules, multi-agent coordination, and enterprise workflows.

Whoever conquers these areas will define the next-generation AI interaction standard.

The Rapid Rise of the Open-Source Camp

On May 14, Nous Research tweeted:

Hermes Agent has reached #1 in token usage on OpenRouter.

This isn't just open-source hype; it's developers voting with their actual usage. High OpenRouter token consumption means the agent is being frequently utilized in real-world scenarios.

Hermes Agent's success stems from three key factors:

1. Three-Tier Memory Architecture

Short-term cache + persistent storage + self-evolving skill library. Simply put:

It remembers what you just discussed
It remembers what you discussed last week
It saves newly acquired skills for direct use next time

Two months ago, agents' memories were wiped clean after each conversation. Today, Hermes can utilize something once it's been taught.

2. Multi-Profile Support

A single agent can switch between multiple personalities or expertise domains. It can toggle between "Python Expert Mode," "Data Analysis Mode," and "Writing Assistant Mode." This isn't just about changing prompts—it's actually loading different skill trees.

3. Tool Integration

It can call external APIs, generate videos, and manipulate files. Through its HyperFrames skill, Hermes can generate complete videos using natural language. This isn't an external API call—it's a native capability.

The fact that open-source projects have reached this level means big tech can no longer simply overwhelm with "resource volume." Developer communities support what works well.

Big Tech's Strategies: Four Distinct Approaches

Of course, big tech isn't going to let open source take their market. What's fascinating is that the strategies of the four major companies are completely different.

OpenAI: Enterprise-First, Security-Focused

OpenAI's agent strategy is clear: "Capture enterprise customers first, then expand to consumers."

On April 15, OpenAI updated its Agents SDK with three crucial features:

Native Sandboxing: Agents can execute code without damaging systems
File Checking: Scans uploaded files to prevent injection attacks
Long-task Memory Recovery: Resumes from checkpoints even after interruption

These are precisely what enterprise customers prioritize most. Individual users might not care, but for customers at Walmart's scale, these features are essential.

That same day, OpenAI released GPT-5.5, natively supporting multi-agent systems. The main agent can now assign tasks to multiple specialized agents, each handling their area of expertise.

Anthropic: The Pursuit of Reliability

Anthropic's approach is more aggressive: offering "Cloud-Managed Agents."

Users don't need to deploy anything themselves or worry about scaling and security. Anthropic hosts everything; users just utilize it.

The accompanying features are powerful:

"Dreaming": Agents autonomously review past conversations and update their memory. This isn't passive storage—it's active organization.
Outcomes: Success determination based on evaluation criteria. Users define "success," and the agent strives toward that goal.
10 Pre-built Financial Agents: Covering high-frequency use cases in the financial industry like business plans, credit memos, and risk assessments.

According to recent WSJ reports, Anthropic's financial services agents are already in production deployment. This isn't conceptual—it's a live production system.

Google: Platform Strategy

Google employs a solid platform strategy: "Provide the platform and let others build on it."

At the Cloud Next conference in April, Google announced the Gemini Enterprise Agent Platform:

Agent Studio: Visually orchestrate agent workflows
Governance and Security: Enterprise-level permission management and audit logs
Integration with Vertex AI: Seamless connection with existing Google Cloud services

Simultaneously, Google released Gemma 4, an open-source model optimized for agent workflows. Even for those seeking open-source solutions, Google is countering with its own models.

Meta: Consumer Penetration

Meta's strategy is most distinctive: "Entering from the consumer end, building shopping and social media scenarios."

According to Reuters, Meta is internally testing an agent called "Hatch" that will be integrated into Instagram and WhatsApp. If you find clothing you like on Instagram, the agent can handle the entire ordering process.

Meanwhile, Meta is researching its own model "Muse Spark" to reduce dependence on Llama. The goal is to maintain dedicated models without being constrained by open-source models.

Three Critical Domains

What big tech and open-source projects are essentially fighting over are these three domains:

1. Memory Modules

Importance: An agent without memory is in a "first meeting" state every time.

Imagine talking to a colleague who forgets everything you've previously discussed after every conversation. That would be intolerable.

There are three representative technical approaches:

Hermes: Three-tier structure (cache + persistent + evolving)
OpenAI: Checkpoint resumption via native memory recovery
Anthropic: Self-review and active organization via "Dreaming"

Memory modules form the foundation of an agent's "personality." Whoever defines the memory standard will define agent "continuity."

2. Multi-Agent Coordination

Importance: Complex tasks require division of labor.

You can't do a team's work alone, and neither can agents.

Typical examples:

NVIDIA: Supply chain optimization using cuOpt multi-agent systems, orchestrated with LangChain for automatic logistics route planning.
Research papers: Highlighting the "Sovereignty Gap" problem in multi-agent systems, where agents inhibit each other and fail to reach correct solutions.

Multi-agent coordination represents the agent's "organizational form." Whoever solves coordination problems will be able to handle more complex tasks.

3. Enterprise Workflows

Importance: Most directly tied to revenue.

Open source can win developers' hearts, but true cash flow comes from enterprise customers.

Company movements:

OpenAI: Commerce agent through Walmart partnership
Anthropic: 10 pre-built agents for financial services
Google: Enterprise governance, security, and orchestration platform

Enterprise workflows represent the agent's "commercialization path." Whoever secures the earliest enterprise customers will obtain the funding for continuous iteration.

Community Strategy: GitHub Stars vs. Practical Value

How do open-source projects compete with big tech ecosystems?

Hermes answered with the "Hermes Agent Challenge."

The rules are simple:

Build something useful with Hermes or share your experience
Prize: $1,000
Purpose: Capture developer mindshare and build an ecosystem

This is an extremely clever strategy. The $1,000 amount isn't enormous, but it encourages many developers to experiment, share, and build projects. Community ecosystems expand this way.

Big tech captures markets through enterprise contracts; open source captures ecosystems through community challenges. The approaches differ, but the contested territories are the same.

What's Available Now for General Users

Specifically, what features are available today? Three examples:

1. Code Fixes Import an entire project into OpenAI Codex, and it can solve in 30 minutes what would take 2.5 hours manually. This isn't future talk—it's available now.

2. Video Generation Use Hermes's HyperFrames skill to generate complete videos from natural language descriptions. No need to learn editing software—just describe what you want.

3. Supply Chain Optimization NVIDIA cuOpt's multi-agent system automatically plans logistics routes. This is an enterprise application, but the principle is the same: "complex task execution through multi-agent coordination."

What to Watch for in Late 2026

The main battlefield is clear. The focus now is on who actually conquers it.

Three metrics to watch:

1. Can Hermes surpass 100K stars? If Hermes becomes the open-source agent standard, it proves the community has the ability to define next-generation interaction paradigms.

2. Which platform secures the earliest enterprise customers? Among OpenAI, Anthropic, and Google, who will first secure more than 10 Fortune 500 enterprise customers? That creates first-mover advantage.

3. Can the "Sovereignty Gap" in multi-agent coordination be solved? If multi-agent systems can cooperate stably, agents will handle more complex tasks. Otherwise, they'll remain just "advanced toys."

The battle for AI agent dominance has only just begun.

Two months ago, agents were experimental toys. Today, they're already productive tools.

What happens next? Let's wait with anticipation.

Comments (0)

Share:X Hatena

Back to Blog