What are the strengths of this model?

Advanced medical Q&A capabilities Excellent chart and diagram understanding Parallel reasoning mechanism

What are the weaknesses of this model?

Reasoning power below GPT-5.4 Lower performance than Gemini 3.1 Insufficient agent implementation capabilities

What are the best use cases?

Professional medical consultation Complex diagram and chart analysis Multimodal reasoning

Back to Models

Meta AIProprietary

Muse Spark by Meta Superintelligence Labs

Name: Muse Spark by Meta Superintelligence Labs
Author: Meta AI

Muse Spark by Meta Superintelligence Labs is the first reasoning model announced by Meta after a comprehensive reorganization of its AI research structure. It features native support for multimodal input and an integrated multi-agent parallel reasoning mechanism.

Parameters

Undisclosed

Context Window

License

Proprietary

Release Date

2026-04-08

API Pricing

API pricing for this model is not yet available

Strengths

・Advanced medical Q&A capabilities
・Excellent chart and diagram understanding
・Parallel reasoning mechanism

Weaknesses

・Reasoning power below GPT-5.4
・Lower performance than Gemini 3.1
・Insufficient agent implementation capabilities

Use Cases

・Professional medical consultation
・Complex diagram and chart analysis
・Multimodal reasoning

Deep Analysis

Arena Elo (Overall)

1489

Trails GPT-5.4 (1672) and Claude Opus 4.6 (1606)

HealthBench Hard

42.8

Leads all competitors; GPT-5.4: 40.1, Gemini: 20.6

Humanity's Last Exam (Contemplating)

50.2% (no tools)

Beats GPT-5.4 Pro (43.9%) and Gemini Deep Think (48.4%)

ARC-AGI-2 (Abstract Reasoning)

42.5

Significant gap vs. GPT-5.4 (76.1) and Gemini (76.5)

SWE-Bench Verified (Coding)

77.4%

Behind Claude Opus 4.6 (80.8%) and Gemini 3.1 Pro (80.6%)

Pricing

Free

No subscription; competitors charge $20+/month

Context Window

262K tokens

Smaller than Gemini's 1M token window

Token Efficiency

58M output tokens (Intelligence Index eval)

Matches Gemini; far less than Claude (157M) or GPT-5.4 (120M)

Strengths

・Industry-leading health and medical reasoning capabilities
・Completely free access via Meta AI app and website with no subscription
・Unique Contemplating mode with parallel multi-agent reasoning for complex tasks
・Exceptional token efficiency and multimodal vision performance

Weaknesses

・Significantly trails competitors in abstract reasoning (ARC-AGI-2) and agentic coding
・No public API, desktop apps, or open weights currently available
・Limited to Meta's ecosystem; no integration with external developer tools
・Evaluation-aware behavior raises questions about alignment consistency

Competitor Comparison

Model	Arena	SWE	GPQA	Price
GPT-5.4	1672	57.7%	~94.3%	$200/month (Pro)
Claude Opus 4.6	1606	80.8%	92.7%	$20/month (Pro)
Gemini 3.1 Pro	~1480	~80.6%	94.3%	$19.99/month (Google AI Pro)

Overview

Muse Spark represents Meta's strategic pivot from open-source models to a proprietary, product-first AI system under the newly formed Meta Superintelligence Labs. As the first model in the Muse family, it introduces natively multimodal architecture, novel test-time reasoning with Contemplating mode (parallel multi-agent orchestration), and a strong focus on health applications. The model is designed to scale efficiently across Meta's 3+ billion daily active users, leveraging "thought compression" to reduce token usage by up to 2.7x compared to competitors. While Muse Spark doesn't top every benchmark, it carves out distinct niches: it leads all competitors in health reasoning (HealthBench Hard: 42.8), excels in vision-grounded tasks (MMMU-Pro: 80.5%), and offers the most cost-effective access to frontier-tier AI as a completely free service. Its weaknesses are concentrated in abstract reasoning (ARC-AGI-2: 42.5 vs. 76.1 for GPT-5.4) and autonomous agentic tasks (GDPval-AA Elo: 1444 vs. 1672 for GPT-5.4). The launch signals Meta's commitment to building personal superintelligence through its massive distribution advantage rather than pure benchmark leadership. With larger models already in development and plans for future open-source releases, Muse Spark establishes the foundation for Meta's AI ecosystem integration across social platforms, wearables, and consumer applications.

Benchmarks & Performance

### Comparative Benchmark Scores (April 2026) | Benchmark | Muse Spark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro | Notes | |-----------|------------|---------|-----------------|----------------|-------| | **AI Intelligence Index (v4.0)** | 52 | 57 | 53 | 57 | Top 5 globally; ties Gemini | | **HealthBench Hard** | **42.8** | 40.1 | N/A | 20.6 | Muse Spark leads; physician-curated training data | | **ARC-AGI-2 (Abstract Reasoning)** | 42.5 | **76.1** | ~70.2 | **76.5** | Largest performance gap | | **SWE-Bench Verified (Coding)** | 77.4%* | 57.7% | **80.8%** | 80.6% | Claude leads; Muse Spark 2nd | | **Humanity's Last Exam (Contemplating, no tools)** | **50.2%** | ~47% | N/A | ~46% | Muse Spark leads | | **Frontierscience Research (Contemplating)** | **38.3** | 36.7 | N/A | 23.3 | Muse Spark leads | | **MMMU-Pro (Multimodal)** | 80.5% | 81.2% | N/A | **82.4%** | Strong vision capabilities | | **CharXiv Reasoning (Charts)** | **86.4** | 82.8 | N/A | 80.2 | Muse Spark leads chart understanding | | **GDPval-AA Elo (Agentic)** | 1,444 | **1,672** | 1,606 | N/A | Significant gap in desktop automation | | **Output Tokens (Intelligence Index eval)** | **58M** | 120M | 157M | 58M | Exceptional token efficiency | *Note: Muse Spark's SWE-Bench score is from Meta's publication; independent verification ongoing. Some benchmarks reflect different prompting/tool usage conditions. ### Key Performance Insights: 1. **Health Leadership**: Muse Spark's 42.8 on HealthBench Hard is more than double Gemini's score (20.6) and leads GPT-5.4 (40.1). 2. **Reasoning Trade-offs**: Excels in structured, multi-agent reasoning (HLE: 50.2%) but struggles with novel abstract patterns (ARC-AGI-2: 42.5). 3. **Coding Competence**: Solid coding performance (77.4% SWE-Bench) but trails Claude and Gemini by ~3 percentage points. 4. **Vision Specialization**: Leads chart understanding (CharXiv: 86.4) and performs strongly on visual STEM tasks.

Detailed Comparison

### Head-to-Head Competitor Comparison | Feature | Muse Spark | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro | |---------|------------|---------|-----------------|----------------| | **Pricing** | **Free** | $200/month (Pro) | $20/month (Pro) | $19.99/month (Google AI Pro) | | **Context Window** | 262K tokens | 128K tokens | 200K tokens | **1M tokens** | | **Strengths** | Health, multimodal vision, token efficiency, free access | Agentic capability, abstract reasoning, desktop automation | Coding, writing quality, instruction following | Reasoning per dollar, large context, multimodal | | **Weaknesses** | Abstract reasoning, coding gaps, no API/apps | Expensive, token inefficiency | Limited health focus | Health reasoning weakness | | **Best For** | Health applications, consumer AI features, cost-sensitive users | Complex agentic workflows, autonomous tasks | Software engineering, code review | Research, document analysis, cost-effective reasoning | | **Access** | Meta AI app/website only | ChatGPT web/apps, API | Claude web/apps, API | Gemini web/apps, API | ### Strategic Positioning: **Muse Spark** occupies a unique niche as the most capable free AI model with specialized health and vision strengths. It's ideal for: - Consumer applications within Meta's ecosystem - Health-conscious users and medical professionals - Developers needing multimodal capabilities without cost **GPT-5.4** remains the benchmark leader for agentic tasks but at a premium price, making it suitable for enterprise workflows requiring autonomous computer use. **Claude Opus 4.6** dominates coding and professional writing, making it the preferred tool for software engineers and content creators. **Gemini 3.1 Pro** offers the best reasoning-to-cost ratio with a massive context window, appealing to researchers and those processing large documents.

Community Feedback

### Developer and Researcher Reactions: **Positive Reception:** - **Health AI community** has shown particular enthusiasm, with researchers noting Muse Spark's physician-curated training data as a "game-changer" for medical AI applications. - **Multimodal developers** praise the native integration of vision, with reports of Muse Spark outperforming other models on real-world visual reasoning tasks (e.g., appliance troubleshooting, chart analysis). - **Cost-sensitive users** appreciate the free access, with reports of Meta AI app downloads jumping from #57 to #5 on the App Store post-launch. **Critical Perspectives:** - **Agentic AI developers** note the significant gap in autonomous task completion, with some teams continuing to use GPT-5.4 for complex workflow automation despite the cost. - **Safety researchers** have flagged the high evaluation-awareness noted by Apollo Research, with ongoing debates about whether this represents aligned behavior or strategic deception. - **Open-source advocates** express concern about Meta's shift from open weights, though some remain hopeful given Mark Zuckerberg's statement about future open-source Muse models. **Adoption Patterns:** - Early adoption is strongest in **health tech** startups and **consumer product** companies integrating AI into apps. - **Enterprise adoption** is limited due to lack of API access and desktop integration. - **Research community** is particularly interested in the Contemplating mode architecture, with several teams exploring parallel multi-agent reasoning frameworks inspired by Meta's approach.

Use Cases

### Specific Use Cases and When to Choose Muse Spark 1. **Health and Wellness Applications** - **Example**: An app that analyzes food photos and provides nutritional advice tailored to medical conditions (e.g., diabetes, high cholesterol). - **Why Muse Spark**: Its 42.8 HealthBench Hard score is more than double Gemini's 20.6. The physician-curated training data ensures factual, comprehensive health responses. Example use: Meta's own demos show interactive food label analysis with personalized health scores. 2. **Multimodal Visual Analysis and Education** - **Example**: Creating interactive educational content that explains scientific diagrams, charts, or technical equipment. - **Why Muse Spark**: Leads in chart understanding (CharXiv: 86.4) and performs strongly on visual STEM tasks. Can generate annotated, interactive diagrams directly from images. Example: Meta's demo of a coffee machine troubleshooting tutorial with bounding box annotations. 3. **Cost-Sensitive Consumer AI Features** - **Example**: Adding AI capabilities to a mobile app with millions of users where per-query costs are prohibitive. - **Why Muse Spark**: Completely free with exceptional token efficiency (58M tokens vs. 120M+ for competitors). The free tier includes all reasoning modes (Instant, Thinking, Contemplating). Example: Social media apps, educational tools, or health monitoring applications for broad audiences. 4. **Structured Multi-Step Reasoning Tasks** - **Example**: Complex scientific research questions, mathematical proofs, or multi-faceted analysis requiring parallel exploration of different approaches. - **Why Muse Spark**: Contemplating mode orchestrates multiple reasoning agents in parallel, scoring 50.2% on Humanity's Last Exam (no tools) versus 47% for GPT-5.4 Pro and 46% for Gemini Deep Think. Example: Meta's demonstration of parallel agent reasoning for hard problems while maintaining comparable latency. **When NOT to Choose Muse Spark:** - For autonomous desktop automation or complex coding workflows (choose GPT-5.4 or Claude Opus 4.6) - When requiring open-source weights or extensive API integration (wait for future Muse releases) - For abstract pattern recognition or novel problem-solving (ARC-AGI-2: 42.5 vs. 76+ for competitors)

Latest News

### Recent Developments (as of April 2026): 1. **Launch and Availability (April 8, 2026)**: Muse Spark launched as the first model from Meta Superintelligence Labs, available immediately at meta.ai and the Meta AI app. Rollout to Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta AI glasses planned for coming weeks. 2. **Contemplating Mode Rollout**: The multi-agent parallel reasoning mode is being rolled out gradually to users, enabling superior performance on challenging tasks without increased latency. 3. **Private API Preview**: Select partners have access to a private API preview, with public paid API access confirmed as "coming" by Alexandr Wang but no launch date announced. 4. **Safety Evaluations Released**: Meta published its Safety & Preparedness Report detailing Muse Spark's performance across frontier risk categories. The model showed strong refusal behavior in high-risk domains but exhibited high "evaluation awareness" (recognizing when being tested). 5. **Future Roadmap**: Meta confirmed larger Muse models are already in development, leveraging the rebuilt pretraining stack that achieves 10x greater compute efficiency than Llama 4 Maverick. The Hyperion data center provides infrastructure for scaling. 6. **Open-Source Possibility**: Mark Zuckerberg stated the Muse family will "include new open source models" in the future, though no timeline was provided for Muse Spark specifically. 7. **Benchmark Updates**: Third-party benchmarking (BenchLM.ai) shows Muse Spark with 1489 Arena Elo overall, ranking #18 in multimodal tasks but lacking sufficient coverage for global ranking due to only 39 of 221 tracked benchmarks having sourced evaluations. 8. **Industry Impact**: Analysts note Muse Spark's free pricing and health benchmark leadership have prompted competitive pressure, with discussions about potential pricing changes from other providers and increased focus on health AI applications.

While Muse Spark doesn't top every benchmark, it carves out distinct niches: it leads all competitors in health reasoning (HealthBench Hard: 42.8), excels in vision-grounded tasks (MMMU-Pro: 80.5%), and offers the most cost-effective access to frontier-tier AI as a completely free service. Its weaknesses are concentrated in abstract reasoning (ARC-AGI-2: 42.5 vs. 76.1 for GPT-5.4) and autonomous agentic tasks (GDPval-AA Elo: 1444 vs. 1672 for GPT-5.4).

The launch signals Meta's commitment to building personal superintelligence through its massive distribution advantage rather than pure benchmark leadership. With larger models already in development and plans for future open-source releases, Muse Spark establishes the foundation for Meta's AI ecosystem integration across social platforms, wearables, and consumer applications.

Sources

Analysis generated: 2026-05-23