Qwen3.5-Omni-Plus
Qwen3.5-Omni-Plus is a multimodal large language model developed by Alibaba. It features a vast context window of 256K and possesses advanced information processing capabilities.
Parameters
Undisclosed
Context Window
256K
License
https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE
Release Date
2026-03-30
API Pricing
API pricing for this model is not yet available
Strengths
- ・Advanced multimodal capabilities
- ・256K long-context processing
- ・Efficient foundation model design
Weaknesses
- ・Closed license
- ・Commercial use constraints
- ・Restricted access permissions
Use Cases
- ・Large-scale document analysis
- ・Multimodal data processing
- ・Long-context analysis
Deep Analysis
Release Date
March 30, 2026
Total Parameters
~30B
MoE with ~3B active per token
Architecture
Thinker-Talker, Hybrid-Attention MoE
Context Window
262,144 tokens
Max Audio Input
10+ hours continuous
Max Video Input
400+ seconds at 720p/1FPS
Speech Recognition
113 languages
Speech Generation
36 languages
MMAU (audio)
82.2
vs Gemini 3.1 Pro's 81.1
LibriSpeech WER
1.11 (clean), 2.23 (other)
Cuts Gemini's error rate by ~2/3
Strengths
- ・215 SOTA results across audio, audio-video, visual, and text benchmarks
- ・Best-in-class speech recognition: 113 languages, LibriSpeech WER 1.11 (2/3 lower than Gemini)
- ・Native end-to-end multimodal: Thinker-Talker architecture jointly trained from scratch
- ・Voice cloning from short samples with Seed-zh stability score 1.07 (beats ElevenLabs' 13.08)
- ・Minimal text performance gap: MMLU-Redux 94.2 vs 94.3 for standard Qwen3.5-Plus
Weaknesses
- ・Requires ~40GB VRAM for comfortable local inference
- ・215 SOTA claim deserves skepticism — niche benchmarks inflate count
- ・Voice cloning in real-world noisy environments not extensively validated
- ・API pricing not fully finalized at launch (TBD status)
- ・Multimodal architecture adds complexity for text-only use cases
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Gemini 3.1 Pro | ~1480 | N/A | ~91 | Proprietary |
| GPT-Audio | ~1460 | N/A | ~89 | Proprietary |
| Qwen3.5-Omni-Plus | N/A | N/A | ~94.2 (MMLU) | TBD |
| ElevenLabs | N/A | N/A | N/A | Proprietary TTS |
| Minimax | N/A | N/A | N/A | Proprietary |
Qwen3.5-Omni-Plus is the flagship variant of the Qwen3.5-Omni family, a natively omnimodal model with ~30B total parameters (~3B active) that processes text, images, audio, and video while generating both text and streaming speech in a single forward pass. Released March 30, 2026, it claims 215 SOTA results and delivers best-in-class speech recognition (113 languages, WER 1.11) with voice stability that surpasses ElevenLabs.
Sources
Analysis generated: 2026-05-24