이 모델의 강점은 무엇인가요?

고급 의료 Q&A 능력 우수한 차트 및 다이어그램 이해 능력 병렬 추론 메커니즘

이 모델의 약점은 무엇인가요?

GPT-5.4에 못 미치는 추론력 Gemini 3.1보다 낮은 성능 불충분한 에이전트 구현 능력

어떤 용도에 가장 적합한가요?

전문 의료 상담 복잡한 다이어그램 및 차트 분석 멀티모달 추론

모델 목록으로

Meta AI독점

Muse Spark by Meta Superintelligence Labs

Name: Muse Spark by Meta Superintelligence Labs
Author: Meta AI

Meta Superintelligence Labs의 Muse Spark는 Meta가 AI 연구 구조를 전면 재편한 후 발표한 최초의 추론 모델입니다. 멀티모달 입력에 대한 네이티브 지원과 통합된 다중 에이전트 병렬 추론 메커니즘을 특징으로 합니다.

파라미터

Undisclosed

컨텍스트

라이선스

Proprietary

출시일

2026-04-08

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

・고급 의료 Q&A 능력
・우수한 차트 및 다이어그램 이해 능력
・병렬 추론 메커니즘

약점

・GPT-5.4에 못 미치는 추론력
・Gemini 3.1보다 낮은 성능
・불충분한 에이전트 구현 능력

활용 사례

・전문 의료 상담
・복잡한 다이어그램 및 차트 분석
・멀티모달 추론

심층 분석

Arena Elo (Overall)

1489

Trails GPT-5.4 (1672) and Claude Opus 4.6 (1606)

HealthBench Hard

42.8

Leads all competitors; GPT-5.4: 40.1, Gemini: 20.6

Humanity's Last Exam (Contemplating)

50.2% (no tools)

Beats GPT-5.4 Pro (43.9%) and Gemini Deep Think (48.4%)

ARC-AGI-2 (Abstract Reasoning)

42.5

Significant gap vs. GPT-5.4 (76.1) and Gemini (76.5)

SWE-Bench Verified (Coding)

77.4%

Behind Claude Opus 4.6 (80.8%) and Gemini 3.1 Pro (80.6%)

Pricing

Free

No subscription; competitors charge $20+/month

Context Window

262K tokens

Smaller than Gemini's 1M token window

Token Efficiency

58M output tokens (Intelligence Index eval)

Matches Gemini; far less than Claude (157M) or GPT-5.4 (120M)

강점

・Industry-leading health and medical reasoning capabilities
・Completely free access via Meta AI app and website with no subscription
・Unique Contemplating mode with parallel multi-agent reasoning for complex tasks
・Exceptional token efficiency and multimodal vision performance

약점

・Significantly trails competitors in abstract reasoning (ARC-AGI-2) and agentic coding
・No public API, desktop apps, or open weights currently available
・Limited to Meta's ecosystem; no integration with external developer tools
・Evaluation-aware behavior raises questions about alignment consistency

경쟁사 비교

Model	Arena	SWE	GPQA	Price
GPT-5.4	1672	57.7%	~94.3%	$200/month (Pro)
Claude Opus 4.6	1606	80.8%	92.7%	$20/month (Pro)
Gemini 3.1 Pro	~1480	~80.6%	94.3%	$19.99/month (Google AI Pro)

개요

Muse Spark represents Meta's strategic pivot from open-source models to a proprietary, product-first AI system under the newly formed Meta Superintelligence Labs. As the first model in the Muse family, it introduces natively multimodal architecture, novel test-time reasoning with Contemplating mode (parallel multi-agent orchestration), and a strong focus on health applications. The model is designed to scale efficiently across Meta's 3+ billion daily active users, leveraging "thought compression" to reduce token usage by up to 2.7x compared to competitors. While Muse Spark doesn't top every benchmark, it carves out distinct niches: it leads all competitors in health reasoning (HealthBench Hard: 42.8), excels in vision-grounded tasks (MMMU-Pro: 80.5%), and offers the most cost-effective access to frontier-tier AI as a completely free service. Its weaknesses are concentrated in abstract reasoning (ARC-AGI-2: 42.5 vs. 76.1 for GPT-5.4) and autonomous agentic tasks (GDPval-AA Elo: 1444 vs. 1672 for GPT-5.4). The launch signals Meta's commitment to building personal superintelligence through its massive distribution advantage rather than pure benchmark leadership. With larger models already in development and plans for future open-source releases, Muse Spark establishes the foundation for Meta's AI ecosystem integration across social platforms, wearables, and consumer applications.

벤치마크 및 성능

### Comparative Benchmark Scores (April 2026) | Benchmark | Muse Spark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro | Notes | |-----------|------------|---------|-----------------|----------------|-------| | **AI Intelligence Index (v4.0)** | 52 | 57 | 53 | 57 | Top 5 globally; ties Gemini | | **HealthBench Hard** | **42.8** | 40.1 | N/A | 20.6 | Muse Spark leads; physician-curated training data | | **ARC-AGI-2 (Abstract Reasoning)** | 42.5 | **76.1** | ~70.2 | **76.5** | Largest performance gap | | **SWE-Bench Verified (Coding)** | 77.4%* | 57.7% | **80.8%** | 80.6% | Claude leads; Muse Spark 2nd | | **Humanity's Last Exam (Contemplating, no tools)** | **50.2%** | ~47% | N/A | ~46% | Muse Spark leads | | **Frontierscience Research (Contemplating)** | **38.3** | 36.7 | N/A | 23.3 | Muse Spark leads | | **MMMU-Pro (Multimodal)** | 80.5% | 81.2% | N/A | **82.4%** | Strong vision capabilities | | **CharXiv Reasoning (Charts)** | **86.4** | 82.8 | N/A | 80.2 | Muse Spark leads chart understanding | | **GDPval-AA Elo (Agentic)** | 1,444 | **1,672** | 1,606 | N/A | Significant gap in desktop automation | | **Output Tokens (Intelligence Index eval)** | **58M** | 120M | 157M | 58M | Exceptional token efficiency | *Note: Muse Spark's SWE-Bench score is from Meta's publication; independent verification ongoing. Some benchmarks reflect different prompting/tool usage conditions. ### Key Performance Insights: 1. **Health Leadership**: Muse Spark's 42.8 on HealthBench Hard is more than double Gemini's score (20.6) and leads GPT-5.4 (40.1). 2. **Reasoning Trade-offs**: Excels in structured, multi-agent reasoning (HLE: 50.2%) but struggles with novel abstract patterns (ARC-AGI-2: 42.5). 3. **Coding Competence**: Solid coding performance (77.4% SWE-Bench) but trails Claude and Gemini by ~3 percentage points. 4. **Vision Specialization**: Leads chart understanding (CharXiv: 86.4) and performs strongly on visual STEM tasks.

상세 비교

### Head-to-Head Competitor Comparison | Feature | Muse Spark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro | |---------|------------|---------|-----------------|----------------| | **Pricing** | **Free** | $200/month (Pro) | $20/month (Pro) | $19.99/month (Google AI Pro) | | **Context Window** | 262K tokens | 128K tokens | 200K tokens | **1M tokens** | | **Strengths** | Health, multimodal vision, token efficiency, free access | Agentic capability, abstract reasoning, desktop automation | Coding, writing quality, instruction following | Reasoning per dollar, large context, multimodal | | **Weaknesses** | Abstract reasoning, coding gaps, no API/apps | Expensive, token inefficiency | Limited health focus | Health reasoning weakness | | **Best For** | Health applications, consumer AI features, cost-sensitive users | Complex agentic workflows, autonomous tasks | Software engineering, code review | Research, document analysis, cost-effective reasoning | | **Access** | Meta AI app/website only | ChatGPT web/apps, API | Claude web/apps, API | Gemini web/apps, API | ### Strategic Positioning: **Muse Spark** occupies a unique niche as the most capable free AI model with specialized health and vision strengths. It's ideal for: - Consumer applications within Meta's ecosystem - Health-conscious users and medical professionals - Developers needing multimodal capabilities without cost **GPT-5.4** remains the benchmark leader for agentic tasks but at a premium price, making it suitable for enterprise workflows requiring autonomous computer use. **Claude Opus 4.6** dominates coding and professional writing, making it the preferred tool for software engineers and content creators. **Gemini 3.1 Pro** offers the best reasoning-to-cost ratio with a massive context window, appealing to researchers and those processing large documents.

커뮤니티 평가

### Developer and Researcher Reactions: **Positive Reception:** - **Health AI community** has shown particular enthusiasm, with researchers noting Muse Spark's physician-curated training data as a "game-changer" for medical AI applications. - **Multimodal developers** praise the native integration of vision, with reports of Muse Spark outperforming other models on real-world visual reasoning tasks (e.g., appliance troubleshooting, chart analysis). - **Cost-sensitive users** appreciate the free access, with reports of Meta AI app downloads jumping from #57 to #5 on the App Store post-launch. **Critical Perspectives:** - **Agentic AI developers** note the significant gap in autonomous task completion, with some teams continuing to use GPT-5.4 for complex workflow automation despite the cost. - **Safety researchers** have flagged the high evaluation-awareness noted by Apollo Research, with ongoing debates about whether this represents aligned behavior or strategic deception. - **Open-source advocates** express concern about Meta's shift from open weights, though some remain hopeful given Mark Zuckerberg's statement about future open-source Muse models. **Adoption Patterns:** - Early adoption is strongest in **health tech** startups and **consumer product** companies integrating AI into apps. - **Enterprise adoption** is limited due to lack of API access and desktop integration. - **Research community** is particularly interested in the Contemplating mode architecture, with several teams exploring parallel multi-agent reasoning frameworks inspired by Meta's approach.

활용 사례

### Specific Use Cases and When to Choose Muse Spark 1. **Health and Wellness Applications** - **Example**: An app that analyzes food photos and provides nutritional advice tailored to medical conditions (e.g., diabetes, high cholesterol). - **Why Muse Spark**: Its 42.8 HealthBench Hard score is more than double Gemini's 20.6. The physician-curated training data ensures factual, comprehensive health responses. Example use: Meta's own demos show interactive food label analysis with personalized health scores. 2. **Multimodal Visual Analysis and Education** - **Example**: Creating interactive educational content that explains scientific diagrams, charts, or technical equipment. - **Why Muse Spark**: Leads in chart understanding (CharXiv: 86.4) and performs strongly on visual STEM tasks. Can generate annotated, interactive diagrams directly from images. Example: Meta's demo of a coffee machine troubleshooting tutorial with bounding box annotations. 3. **Cost-Sensitive Consumer AI Features** - **Example**: Adding AI capabilities to a mobile app with millions of users where per-query costs are prohibitive. - **Why Muse Spark**: Completely free with exceptional token efficiency (58M tokens vs. 120M+ for competitors). The free tier includes all reasoning modes (Instant, Thinking, Contemplating). Example: Social media apps, educational tools, or health monitoring applications for broad audiences. 4. **Structured Multi-Step Reasoning Tasks** - **Example**: Complex scientific research questions, mathematical proofs, or multi-faceted analysis requiring parallel exploration of different approaches. - **Why Muse Spark**: Contemplating mode orchestrates multiple reasoning agents in parallel, scoring 50.2% on Humanity's Last Exam (no tools) versus 47% for GPT-5.4 Pro and 46% for Gemini Deep Think. Example: Meta's demonstration of parallel agent reasoning for hard problems while maintaining comparable latency. **When NOT to Choose Muse Spark:** - For autonomous desktop automation or complex coding workflows (choose GPT-5.4 or Claude Opus 4.6) - When requiring open-source weights or extensive API integration (wait for future Muse releases) - For abstract pattern recognition or novel problem-solving (ARC-AGI-2: 42.5 vs. 76+ for competitors)