개요
DeepSeek V4 Pro, released April 24, 2026, is DeepSeek-AI's flagship open-weights reasoning model and the largest MIT-licensed language model to date at 1.6 trillion total parameters (49 billion active via Mixture-of-Experts routing). It introduces a new V4 architecture with a 1-million-token context window, a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), and dual thinking/non-thinking inference modes. The model represents a significant generational leap from V3.2, scoring 52 on the Artificial Analysis Intelligence Index (a 10-point gain) and tying or leading closed frontier models on key coding benchmarks.
V4 Pro's core value proposition is delivering near-frontier performance at a fraction of closed-model pricing. At list prices of $1.74/$3.48 per million input/output tokens, it costs roughly one-seventh of Claude Opus 4.7's output pricing while matching it on SWE-Bench Verified (80.6% vs 80.8%) and leading on competitive programming (LiveCodeBench 93.5, Codeforces rating 3206). The efficiency gains come from architectural innovations: at 1M-token context, V4 Pro requires only 27% of V3.2's single-token inference FLOPs and 10% of its KV cache. DeepSeek also released V4 Flash (284B total/13B active) at $0.14/$0.28 per million tokens, which delivers SWE-Bench performance within 1.6 points of Pro at 12x lower cost.
The model is not without significant caveats. It trails closed frontier models substantially on factual knowledge retrieval (SimpleQA 57.9% vs Gemini's 75.6%) and cross-domain reasoning (Humanity's Last Exam 37.7% vs 44.4%). The 94% hallucination rate means the model nearly always generates a response even when it lacks the underlying knowledge. Political censorship is embedded in the training weights themselves, affecting both hosted and self-hosted deployments by default. For teams building production agents, V4 Pro represents the strongest open-weights option available, but requires careful evaluation against workload-specific requirements and compliance constraints.
벤치마크 및 성능
## Detailed Benchmark Scores (V4-Pro Max Effort)
### Coding & Software Engineering
| Benchmark | V4-Pro Max | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-Bench Verified | 80.6% | 80.8% | 80.0% | 80.6% |
| SWE-Bench Pro | 55.4% | 57.3% | 57.7% | 54.2% |
| LiveCodeBench Pass@1 | 93.5 | 88.8 | — | 91.7 |
| Codeforces Rating | 3206 | — | 3168 | 3052 |
| Terminal-Bench 2.0 | 67.9% | 65.4% | 75.1% | 68.5% |
### Knowledge & Reasoning
| Benchmark | V4-Pro Max | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| GPQA Diamond | 90.1% | 91.3% | 93.0% | 94.3% |
| MMLU-Pro | 87.5% | 89.1% | 87.5% | 91.0% |
| IMOAnswerBench | 89.8% | 75.3% | 91.4% | 81.0% |
| HMMT 2026 Feb | 95.2% | 96.2% | 97.7% | 94.7% |
| Humanity's Last Exam | 37.7% | 40.0% | 39.8% | 44.4% |
| SimpleQA-Verified | 57.9% | 46.2% | 45.3% | 75.6% |
### Agentic & Tool Use
| Benchmark | V4-Pro Max | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| MCPAtlas Public | 73.6% | 73.8% | 67.2% | 69.2% |
| GDPval-AA (Elo) | 1554 | 1619 | 1674 | 1314 |
| BrowseComp | 83.4% | 83.7% | 82.7% | 85.9% |
| Toolathlon | 51.8% | 47.2% | 54.6% | 48.8% |
### Long Context
| Benchmark | V4-Pro Max | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|
| MRCR 1M (MMR) | 83.5% | 92.9% | 76.3% |
| CorpusQA 1M (ACC) | 62.0% | 71.7% | 53.8% |
### Aggregate Index
| Index | V4-Pro Max |
|---|---|
| Artificial Analysis Intelligence Index | 52/100 (#2 open-weights, behind Kimi K2.6 at 54) |
| Hallucination Rate (AA-Omniscience) | 94% (when uncertain, nearly always responds anyway) |
| AA-Omniscience Score | -10 (improved from V3.2's -21) |
Key observations: V4 Pro leads on competitive coding (LiveCodeBench, Codeforces), ties on SWE-Bench Verified, and is strong on math (IMOAnswerBench 89.8%). It consistently trails on factual knowledge (SimpleQA, HLE) and advanced academic reasoning. The 1M-token context is genuine but quality degrades past ~800K tokens per practitioner reports.
상세 비교
## DeepSeek V4 Pro vs Claude Opus 4.7
Claude Opus 4.7 leads on most shared benchmarks by 4-9 points: GPQA Diamond (94.2% vs 90.1%), HLE (46.9% vs 37.7%), SWE-Bench Pro (64.3% vs 55.4%). V4 Pro ties on SWE-Bench Verified (80.6% vs 80.9%) and wins on competitive programming (Codeforces 3206, LiveCodeBench 93.5). The pricing gap is enormous: V4 Pro at list $3.48/1M output vs Opus 4.7 at $25/1M output — roughly 7x cheaper. Both support 1M-token context windows. Opus 4.7 supports native image input; V4 Pro is text-only (separate vision variant exists but lags). V4 Pro outputs up to 384K tokens vs Opus 4.7's 128K cap, a significant advantage for long single-response generation.
## DeepSeek V4 Pro vs GPT-5.4
GPT-5.4 leads on math (HMMT 97.7% vs 95.2%) and Terminal-Bench 2.0 (75.1% vs 67.9%). V4 Pro leads on competitive programming (Codeforces 3206 vs 3168, LiveCodeBench 93.5) and ties on SWE-Bench Verified. Pricing: GPT-5.4 at $2.50/$15.00 per 1M tokens vs V4 Pro at $1.74/$3.48 — V4 Pro is roughly 4x cheaper on output. V4 Pro's MIT license enables self-hosting; GPT-5.4 is API-only.
## DeepSeek V4 Pro vs Gemini 3.1 Pro
Gemini 3.1 Pro leads on knowledge benchmarks by wide margins: SimpleQA (75.6% vs 57.9%), HLE (44.4% vs 37.7%), GPQA Diamond (94.3% vs 90.1%). V4 Pro leads on competitive programming and ties on SWE-Bench Verified. Gemini's pricing at $2.00/$12.00 is cheaper than V4 Pro's list price ($1.74/$3.48) on input but more expensive on output. Both support 1M-token context; V4 Pro's long-context retrieval (MRCR 1M: 83.5%) substantially beats Gemini's (76.3%).
## DeepSeek V4 Pro vs Kimi K2.6
Kimi K2.6 scores 54 on the Artificial Analysis Intelligence Index vs V4 Pro's 52, making it the #1 open-weights reasoning model. Kimi K2.6 scores 80.2% on SWE-Bench Verified vs V4 Pro's 80.6% — a virtual tie. Kimi K2.6 is reported at $948 to run the full Artificial Analysis benchmark suite vs V4 Pro at $1,071. Both are MIT-licensed open-weights models.
## Self-Hosting Considerations
V4 Pro weights are 862GB (BF16) and require 8×H100-class hardware for usable throughput. V4 Flash at 160GB is substantially more deployable. Both use FP4/FP8 mixed precision for expert parameters. The MIT license allows full commercial use, redistribution, and fine-tuning without restrictions.
커뮤니티 평가
The developer and researcher community has responded to V4 Pro with cautious enthusiasm tempered by practical concerns. On Hugging Face, the model has accumulated over 4.5 million downloads and 4,174 likes in its first two weeks, indicating strong adoption interest.
Practitioners running V4 Pro in production report that coding performance genuinely matches the benchmark claims. One noted on social media that V4 Pro 'holds the thread of what happened three hours ago' in extended sessions, attributing this to the new hybrid attention architecture. Multiple developers report switching agentic coding workloads from Claude Opus to V4 Pro due to the cost differential, with one practitioner stating: 'For agentic coding at a one-seventh output cost, it is the rational pick this week.'
The most common community concern centers on hallucination and factual reliability. The 94% hallucination rate means V4 Pro will confidently generate answers even when it lacks the knowledge, making it unsuitable for tasks requiring high factual accuracy without verification workflows. Developers working on research assistants or knowledge-heavy applications report noticeable gaps compared to Gemini models.
The political censorship issue continues to generate debate. Community testing confirms that censorship is embedded in the training weights rather than applied as an API filter, meaning self-hosted deployments carry the same restrictions by default. Enterprise teams in regulated industries (particularly healthcare, finance, and legal) report they must either self-host with careful output filtering or route sensitive tasks to non-DeepSeek models.
Adoption patterns show V4 Pro being used primarily for: (1) high-volume code generation and review, (2) repository-scale refactoring with its 1M context window, (3) agentic workflows with tool use, and (4) as a cost-optimized backbone in multi-model architectures where harder tasks are routed to Claude or GPT. Several developers report using V4 Flash for routine tasks and V4 Pro only when full reasoning depth is needed, taking advantage of the 12x pricing gap between the two tiers.
The Lambda AI team characterized the release as landing in 'the quietest' way possible relative to expectations, noting that the infrastructure conversation has crowded out the model conversation in 2026. Early independent evaluations are still accumulating, and the community is waiting for the full technical report promised with the GA release.
활용 사례
## 1. High-Volume Agentic Coding and CI/CD Integration
V4 Pro is the strongest choice for teams running millions of code generation calls per day where cost dominates the decision. With output pricing at $3.48/1M tokens (promotional: $0.87/1M), it costs 7-8x less than Claude Opus 4.7 while tying on SWE-Bench Verified. Use V4 Pro for code review automation, CI-attached linting suggestions, test generation, and multi-file refactoring. Its 384K output token cap is particularly valuable for generating complete refactored files in a single response. Route the hardest 10% of tasks (complex bug fixes requiring surgical precision) to Claude Opus, and the remaining 90% to V4 Pro to optimize spend.
## 2. Long-Context Document Analysis and Research
The 1M-token context window with genuine retrieval capability (MRCR 1M: 83.5%) makes V4 Pro viable for ingesting entire codebases, legal document sets, or technical corpora in a single prompt. Use it for codebase mapping, dependency analysis, contract review, and multi-document summarization. The hybrid attention architecture keeps inference costs manageable at long contexts — V4 Pro requires only 27% of V3.2's FLOPs at 1M tokens. Choose V4 Pro over Gemini 3.1 Pro when you need strong long-context retrieval combined with competitive coding ability; choose Gemini when factual accuracy is paramount.
## 3. Self-Hosted Enterprise Deployment
For organizations with compliance requirements mandating on-premises AI (defense, healthcare, financial services), V4 Pro's MIT license and open weights provide the most capable self-hosted option. Deploy on 8×H100 nodes for production throughput. The political censorship in weights is a known limitation for some use cases, but self-hosting eliminates the data collection concerns present in the API. Use V4 Flash (160GB weights, 13B active params) for lower-priority tasks to reduce hardware requirements. This is the recommended path when data sovereignty is non-negotiable and you cannot use closed API-only models.
## 4. Multi-Model Orchestration Architecture
V4 Pro works best not as a standalone replacement but as the workhorse in a routing architecture. Build a classifier that sends simple queries to V4 Flash ($0.28/1M output), standard coding tasks to V4 Pro ($3.48/1M), and the hardest reasoning/knowledge tasks to Claude Opus 4.7 ($25/1M) or Gemini 3.1 Pro ($12/1M). This tiered approach can reduce total inference costs by 60-80% versus routing everything to the most expensive model. V4 Pro's OpenAI-compatible API means integration into existing agent harnesses (Claude Code, LangChain, AutoGen) requires only a base URL and model ID change.
최신 뉴스
## DeepSeek V4 Preview Release (April 24, 2026)
DeepSeek released V4 Pro and V4 Flash as preview models on April 24, 2026, marking the first new architecture since V3. The models are available on DeepSeek's first-party API and through 15+ third-party providers including Together, Novita, Fireworks, DeepInfra, SiliconFlow, and Alibaba. Weights are on Hugging Face under the MIT license.
## Promotional Pricing Through May 31, 2026
DeepSeek is running a 75% discount on V4 Pro pricing through May 31, 2026: $0.435/1M input and $0.87/1M output (down from list prices of $1.74/$3.48). Cache-hit input pricing is $0.003625/1M during the promo period. After the promotional period, pricing returns to list rates.
## Legacy API Retirement (July 24, 2026)
The legacy `deepseek-chat` and `deepseek-reasoner` model IDs will be fully retired and inaccessible after July 24, 2026 at 15:59 UTC. They currently route to DeepSeek V4 Flash (non-thinking and thinking modes respectively). Migration requires only a one-line model swap; the base URL does not change.
## Full Technical Report Pending
DeepSeek has labeled V4 as a 'preview' release, with a full technical report expected with the GA (general availability) build. Benchmark numbers cited in initial coverage are from DeepSeek's release materials and have not yet been fully independently replicated. Scores may shift on the final GA model.
## Two-Tier Architecture Innovation
V4 marks DeepSeek's first two-model lineup: V4 Pro (1.6T total/49B active) for maximum capability and V4 Flash (284B total/13B active) for faster, lower-cost inference. Both share the same architecture family but target different performance/cost profiles. Flash delivers SWE-Bench performance within 1.6 points of Pro at 12x lower output cost.
## Infrastructure Partnerships
Lambda AI announced deployment support for both V4 Pro and V4 Flash, with model cards and serving guidance. NVIDIA and Lambda are co-designing infrastructure optimized for open models like V4, as demonstrated in MLPerf Inference V6 results. Multiple cloud inference providers have added V4 endpoints within days of release.