모델 목록으로
xAI독점

Grok 4.3 Beta (Early Access)

Grok 4.3 Beta (얼리 액세스)는 xAI가 개발한 추론 모델입니다. 약 5조 개의 파라미터 대규모 구성과 200만 토큰의 매우 긴 컨텍스트 윈도우가 특징입니다.

파라미터

5000.0B

컨텍스트

2000K

라이선스

Proprietary

출시일

2026-05-17

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

  • 대규모 ~5조 파라미터
  • 광대한 200만 토큰 컨텍스트 윈도우
  • 고급 추론 전문화

약점

  • 비공개 소스 라이선싱
  • 베타 버전의 잠재적 불안정성
  • 높은 연산 자원 요구

활용 사례

  • 초장문 문서 분석
  • 복잡한 논리적 추론 작업
  • 대규모 데이터의 맥락 처리

심층 분석

Artificial Analysis Intelligence Index

53

#10 overall, +4 vs Grok 4.20

Arena Elo (Text Overall)

1451

9,082 votes; Coding: 1493

GDPval-AA (Agentic Tasks)

1500 ELO

+321 over Grok 4.20; trails GPT-5.5 by 276

Input Price

$1.25/1M tokens

37.5% cheaper than Grok 4.20

Context Window

1M tokens

Grok 4.20 retains 2M for max-context workloads

GPQA Diamond

90.1%

#14 on Easy Benchmarks

Benchmark Run Cost (AA Index)

$395

~20% less than Grok 4.20; vs GPT-5.5: ~$3,959

강점

  • Best cost-per-intelligence ratio in the frontier tier: $1.25/M input places it on the Pareto frontier for intelligence vs. cost, roughly 12× cheaper than Claude Opus 4.7
  • Massive agentic task improvement: +321 ELO on GDPval-AA real-world agentic benchmarks, validated by Starlink's 70% autonomous resolution rate in production
  • First xAI model with native video input and document generation (PDF, PPTX, XLSX), breaking Gemini's monopoly on production-grade video understanding

약점

  • No persistent memory at any tier—including the $300/month SuperGrok Heavy plan—requiring custom memory layers for any stateful application
  • Documented 'narcolepsy' regression: autonomous agent tasks show prolonged inactivity in sustained-action simulations (Andon Labs Vending-Bench 2), a production risk for agentic workflows
  • Coding performance lags Claude Opus 4.7 by ~14 points on SWE-bench (~72% vs ~86%), ruling it out as a primary coding model

경쟁사 비교

ModelArenaSWEGPQAPrice
Claude Opus 4.7~1500~86%~92%~$15/$75
GPT-5.5 (xhigh)~1510~83%~93%$5/$30
Gemini 3.1 Pro Preview~1480~76%~91%~$1.25/$5.00

Grok 4.3 (launched April 30, 2026) is xAI's most cost-efficient frontier model to date, scoring 53 on the Artificial Analysis Intelligence Index while dramatically undercutting competitors on price. The model represents a deliberate strategic pivot: rather than chasing raw intelligence leadership (GPT-5.5 scores 60, Claude Opus 4.7 scores ~62–67), xAI optimized for the price-performance frontier. Input costs dropped 37.5% and output costs 58.3% versus the predecessor Grok 4.20, while intelligence scores actually improved. The headline metric is GDPval-AA, a real-world agentic task benchmark, where Grok 4.3 jumped 321 ELO points to 1500—surpassing Gemini 3.1 Pro, GPT-5.4 mini, and Kimi K2.5—though it still trails GPT-5.5 (xhigh) by 276 ELO points.

Feature-wise, Grok 4.3 introduces several production-relevant capabilities: native video input (breaking Gemini's monopoly on commercial video understanding APIs), built-in document generation (PDF, PowerPoint, spreadsheets directly from conversation), and always-on chain-of-thought reasoning. The model runs at ~100 tokens/second with a 1M-token context window, a reduction from Grok 4.20's 2M tokens, though the older model remains available for maximum-context workloads. Prompt caching at $0.20/M tokens further reduces costs for RAG and repeated-context applications. xAI also launched Grok Imagine Agent Mode for creative production workflows and integrated tighter coupling with Grok Computer, the autonomous desktop agent.

However, Grok 4.3 arrives with notable gaps. Persistent memory remains absent at every tier, including the $300/month SuperGrok Heavy plan. Independent testing by Andon Labs revealed a 'narcolepsy' regression on sustained autonomous tasks—the model sometimes remains idle instead of taking required actions. Coding performance lags Claude Opus 4.7 significantly (~14 points on SWE-bench), and the AA-Omniscience Non-Hallucination Rate actually dropped 8 points versus Grok 4.20, trading reliability for higher accuracy scores. The model is best understood not as a general-purpose frontier leader but as a specialist: the most cost-effective option for long-context agentic workflows, customer support automation, and document-heavy analysis pipelines where intelligence-per-dollar matters more than absolute peak capability.

분석 생성일: 2026-05-23