Back to Blog
Benchmark

Claude Sonnet 5 Arrives: Anthropic's Mid-Range Model Beats GPT-5.5 on 5 of 7 Benchmarks

On June 30, 2026, Anthropic officially released Claude Sonnet 5, the most powerful model in the Sonnet lineup to date. Serving as the new default for Free and Pro users, Sonnet 5 defeats GPT-5.5 across five out of seven shared benchmarks — all at roughly one-fifth the cost of Opus 4.8.

This is more than a routine upgrade. Sonnet 5 represents a watershed moment: for the first time, a mid-tier model outperforms the previous generation's flagship across multiple key benchmarks. For developers and enterprise users alike, the question of which model to use — and how to balance cost against performance — has never been more nuanced.

Specifications at a Glance

SpecClaude Sonnet 5Claude Sonnet 4.6Claude Opus 4.8GPT-5.5
Release DateJune 30, 2026March 2026May 28, 2026April 23, 2026
DeveloperAnthropicAnthropicAnthropicOpenAI
Context Window1M tokens200K tokens1M tokens1,050K tokens
Max Output128K tokens128K tokens
Input Price (/1M tokens)$2 (promo) → $3$3$5$5
Output Price (/1M tokens)$10 (promo) → $15$15$25$30
Cache Hit Discount90%90%90%Yes
Batch Discount50%50%50%50%

Sonnet 5's pricing is aggressively competitive. During the promotional period (through August 31, 2026), input costs drop to just $2 per 1M tokens — matching Gemini 3.1 Pro on price while delivering significantly better performance. Even at its standard rate of $3/1M input, it's only 60% of GPT-5.5's cost.

Benchmark Breakdown: How Strong Is Sonnet 5?

Coding

BenchmarkSonnet 5Sonnet 4.6Opus 4.8GPT-5.5
SWE-bench Pro (Agentic Coding)63.2%58.1%69.2%58.6%
Terminal-Bench 2.1 (Terminal Coding)80.4%67.0%82.7%78.2%

SWE-bench Pro is the gold standard for evaluating AI coding agents. Sonnet 5's 63.2% doesn't just beat Sonnet 4.6 by 5.1 percentage points — it also surpasses GPT-5.5's 58.6%. In practical terms, Sonnet 5 resolves real GitHub issues roughly 5 percentage points more often than OpenAI's model.

Terminal-Bench 2.1 measures coding ability during extended terminal sessions. Here, Sonnet 5 scores 80.4%, again besting GPT-5.5 (78.2%) and closing in on Opus 4.8 (82.7%).

Computer Use

BenchmarkSonnet 5Sonnet 4.6Opus 4.8GPT-5.5
OSWorld-Verified (Desktop)81.2%78.5%83.4%78.7%

OSWorld-Verified tests how well an AI can operate within real desktop environments. Sonnet 5 reaches 81.2%, edging out GPT-5.5 (78.7%) and narrowing the gap with Opus 4.8 (83.4%) to just 2.2 points. For enterprises exploring AI-powered alternatives to traditional RPA, this is a meaningful signal.

Knowledge & Reasoning

BenchmarkSonnet 5Sonnet 4.6Opus 4.8GPT-5.5
HLE (with tools)57.4%46.8%57.9%52.2%
HLE (no tools)43.2%34.6%49.8%41.4%

Humanity's Last Exam (HLE) is among the most demanding reasoning benchmarks available. With tool access, Sonnet 5 hits 57.4% — virtually matching Opus 4.8 (57.9%) and pulling well ahead of GPT-5.5 (52.2%). This indicates that Sonnet 5's reasoning capabilities are now approaching Anthropic's top-tier flagship.

Pricing Comparison: The New Value King

ModelInput (/1M tokens)Output (/1M tokens)100K Input + 10K Output
Claude Sonnet 5 (promo)$2$10$0.30
Claude Sonnet 5 (standard)$3$15$0.45
Claude Sonnet 4.6$3$15$0.45
Claude Opus 4.8$5$25$0.75
GPT-5.5$5$30$0.80
Gemini 3.1 Pro$2$8$0.28

Take a typical coding task: 100K tokens of input (code context) plus 10K tokens of output (generated code). At Sonnet 5's promo price, that costs just $0.30 — less than 40% of GPT-5.5's $0.80. Even after the promotional window closes ($0.45), it's still only 56% of GPT-5.5's price.

It's worth noting that Gemini 3.1 Pro is even cheaper ($0.28), but it falls well behind Sonnet 5 on both coding and computer-use benchmarks. On a performance-per-dollar basis, Sonnet 5 is the clear winner.

Sonnet 5 vs Sonnet 4.6: Is the Upgrade Worth It?

The improvements over Sonnet 4.6 are comprehensive:

DimensionImprovementNotes
SWE-bench Pro+5.1ppMeaningful coding gains
Terminal-Bench 2.1+13.4ppMajor leap in terminal proficiency
OSWorld-Verified+2.7ppMore reliable desktop operations
HLE (with tools)+10.6ppQualitative reasoning leap
HLE (no tools)+8.6ppSubstantial unaided reasoning gains
Context Window200K → 1M tokens
Input Price−33%$3 → $2 (promo)

The standout improvements are the 13.4-point jump on Terminal-Bench 2.1 and the 5× context window expansion to 1M tokens. For current Sonnet 4.6 users, this is a no-brainer upgrade — better performance, bigger context, lower price.

Sonnet 5 vs GPT-5.5: Real-World Differences

Across seven shared benchmarks, Sonnet 5 takes a 5–2 lead:

BenchmarkWinnerMargin
SWE-bench ProSonnet 5+4.6pp
Terminal-Bench 2.1Sonnet 5+2.2pp
OSWorld-VerifiedSonnet 5+2.5pp
HLE (with tools)Sonnet 5+5.2pp
HLE (no tools)Sonnet 5+1.8pp
CursorBench v3.1GPT-5.5+3.1pp
GDPval-AAGPT-5.5+151 Elo

GPT-5.5 still holds the edge on CursorBench (IDE-integrated coding) and GDPval-AA (real-world workloads), suggesting OpenAI retains advantages in productization and deployment maturity. But Sonnet 5's sweeping wins on core capability benchmarks — combined with its significant price advantage — make it the stronger choice for most use cases.

Which Model Should You Use?

For Developers

Use CaseRecommended ModelWhy
Agentic coding (complex bug fixes, refactoring)Claude Sonnet 5SWE-bench Pro 63.2%, best value
IDE-integrated coding (daily work)GPT-5.5CursorBench 64.3%, deeper IDE integration
Terminal ops, long-running automationClaude Sonnet 5Terminal-Bench 80.4%, beats GPT-5.5
Mission-critical tasks requiring max accuracyClaude Opus 4.8Still the strongest model overall

For Enterprises

Use CaseRecommended ModelWhy
Desktop automation / RPA replacementClaude Sonnet 5OSWorld 81.2%, only 40% the cost of Opus
Large-scale code reviewClaude Sonnet 51M context + $2 input price
Customer service automationGPT-5.5Higher GDPval-AA, more productization experience
Document analysis, bulk data processingGemini 3.1 Pro2M context + $2 input, lowest cost

Budget-First Strategies

Monthly BudgetRecommended Strategy
GenerousUse Opus 4.8 for critical tasks, Sonnet 5 for everything else
ModerateSonnet 5 as your primary model ($2/1M input) — it covers 90% of scenarios
TightMaximize Sonnet 5's promo pricing through August, then reassess whether to step down to Gemini

The Bottom Line: Sonnet 5 Is the Default for H2 2026

The arrival of Claude Sonnet 5 marks a new chapter in the AI model arms race. A mid-tier model has, for the first time, outperformed the previous generation's flagship across multiple key benchmarks — and it's doing so at a friendlier price point.

Key takeaways:

  • Sonnet 5 beats GPT-5.5 on 5 of 7 benchmarks while costing less than half as much
  • Sonnet 5's reasoning nearly matches Opus 4.8 (HLE with tools: 57.4% vs 57.9%) at just 40% of the price
  • For Sonnet 4.6 users, upgrading is a no-brainer — better across the board, 5× the context, lower price
  • Promo pricing runs through August 31 — the $2/$10 window is the best time to try Sonnet 5

The AI model landscape for the second half of 2026 is clear: Sonnet 5 is the default recommendation. Only reach for alternatives when you need maximum precision (Opus 4.8), the largest context window (Gemini 3.1 Pro), or the most mature IDE integration (GPT-5.5).

Comments (0)

Share:XHatena

Post a Comment

Loading...