모델 목록으로
Anthropic독점

Claude Mythos Preview

Anthropic's latest reasoning-specialized model. It adopts the Mythos architecture and records 64.70 on the HLE benchmark, among other metrics, achieving top-level performance in complex reasoning tasks. With the Managed Agents function, it enables autonomous tool use and multi-step task execution. Its design emphasizes both safety and performance.

파라미터

Undisclosed

컨텍스트

200K

라이선스

Proprietary

출시일

2026-04-08

일본어 처리 능력

High-Quality JP

Multilingual model with strong Japanese language processing capabilities.

API 가격

입력 가격 (1M 토큰당)

$15

출력 가격 (1M 토큰당)

$75

과금 모드: standard

강점

  • Industry-leading reasoning capabilities
  • Autonomous task execution via Managed Agents
  • Handles 200K token long-context
  • Strong emphasis on safety

약점

  • High API costs
  • Not open-source
  • Relatively slow inference speed

활용 사례

  • Complex reasoning tasks
  • Autonomous agents
  • Long-document analysis and summarization
  • Advanced programming assistance

심층 분석

SWE-bench Verified

93.9%

#1 overall; Opus 4.6: 80.8%, GPT-5.5: not reported

GPQA Diamond

94.6%

#1; Opus 4.6: 91.3%, Gemini 3.1 Pro: 94.3%

CyberGym

83.1%

#1; Opus 4.6: 66.6%, GPT-5.5: 81.8%

Humanity's Last Exam (w/ tools)

64.7%

#1; Opus 4.6: 53.1%, GPT-5.4: 52.1%

USAMO 2026

97.6%

Largest single benchmark jump: +55pp over Opus 4.6 (42.3%)

Input/Output Price

$25 / $125 per 1M tokens

5× Opus 4.6; invitation-only via Project Glasswing

강점

  • Highest scores ever recorded on SWE-bench Verified (93.9%), CyberGym (83.1%), and USAMO 2026 (97.6%) across all frontier models
  • Autonomous offensive cybersecurity capability unmatched by any public model—discovered thousands of zero-days including 27-year-old OpenBSD and 16-year-old FFmpeg bugs
  • Generational leap in long-horizon agentic and terminal tasks (Terminal-Bench 2.0: 82.0%, reaching 92.1% with extended timeouts)

약점

  • Not publicly available—restricted to ~52 vetted organizations under Project Glasswing with no planned general availability
  • Extremely expensive at $25/$125 per million tokens, 5× the cost of Opus 4.7, limiting practical adoption even for approved partners
  • Offensive cyber capabilities prompted Anthropic to withhold public release, creating a fundamental access barrier that no benchmark score can overcome

경쟁사 비교

ModelArenaSWEGPQAPrice
Claude Opus 4.7N/A87.6%94.2%$5/$25
GPT-5.5 (OpenAI)N/ANot publicly disclosed
Gemini 3.1 Pro (Google)N/A80.6%94.3%Not publicly disclosed

Claude Mythos Preview, announced April 7, 2026, is Anthropic's most powerful model to date and the first to sit above the Opus tier in Anthropic's hierarchy (Haiku → Sonnet → Opus → Mythos). Internally codenamed 'Capybara,' it represents what Anthropic describes as a 4.3× jump over its previous performance trendline. The model achieves state-of-the-art results across coding, reasoning, cybersecurity, and agentic benchmarks—most notably 93.9% on SWE-bench Verified, 97.6% on USAMO 2026 (a 55-point leap over Opus 4.6), 83.1% on CyberGym, and a saturated 100% on Cybench. Its 1M-token context window and 128K-token output ceiling match the largest in the Claude family.

What distinguishes Mythos from every other frontier model release is its deployment model. Anthropic has explicitly declined to make it generally available, citing offensive cybersecurity capabilities that exceed what they consider safe for unrestricted access. Instead, Mythos is deployed through Project Glasswing, a coalition of 12 major technology companies (AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, NVIDIA, JPMorganChase, Broadcom, Palo Alto Networks, Linux Foundation) plus ~40 additional critical-infrastructure organizations. Anthropic committed $100M in usage credits and $4M in open-source security donations. The model has autonomously discovered thousands of zero-day vulnerabilities across every major operating system and browser, including bugs that evaded millions of automated test runs over 16–27 years.

The strategic implications are significant. Mythos represents a new paradigm where the most capable frontier models may not be broadly accessible. Anthropic's 244-page system card includes a clinical psychiatrist assessment (a first for any Claude model) and white-box interpretability analysis. The company states that Mythos-class capabilities will eventually flow into a future Claude Opus release once safety safeguards mature. For the broader AI ecosystem, Mythos signals that the gap between 'capable enough to deploy' and 'capable enough to require restriction' is now a live industry question.

분석 생성일: 2026-05-23