Back to Models
AlibabaProprietary

Qwen3.5-Omni-Plus

Qwen3.5-Omni-Plus is a multimodal large language model developed by Alibaba. It features a vast context window of 256K and possesses advanced information processing capabilities.

Parameters

Undisclosed

Context Window

256K

License

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

Release Date

2026-03-30

API Pricing

API pricing for this model is not yet available

Strengths

  • Advanced multimodal capabilities
  • 256K long-context processing
  • Efficient foundation model design

Weaknesses

  • Closed license
  • Commercial use constraints
  • Restricted access permissions

Use Cases

  • Large-scale document analysis
  • Multimodal data processing
  • Long-context analysis

Deep Analysis

Release Date

March 30, 2026

Total Parameters

~30B

MoE with ~3B active per token

Architecture

Thinker-Talker, Hybrid-Attention MoE

Context Window

262,144 tokens

Max Audio Input

10+ hours continuous

Max Video Input

400+ seconds at 720p/1FPS

Speech Recognition

113 languages

Speech Generation

36 languages

MMAU (audio)

82.2

vs Gemini 3.1 Pro's 81.1

LibriSpeech WER

1.11 (clean), 2.23 (other)

Cuts Gemini's error rate by ~2/3

Strengths

  • 215 SOTA results across audio, audio-video, visual, and text benchmarks
  • Best-in-class speech recognition: 113 languages, LibriSpeech WER 1.11 (2/3 lower than Gemini)
  • Native end-to-end multimodal: Thinker-Talker architecture jointly trained from scratch
  • Voice cloning from short samples with Seed-zh stability score 1.07 (beats ElevenLabs' 13.08)
  • Minimal text performance gap: MMLU-Redux 94.2 vs 94.3 for standard Qwen3.5-Plus

Weaknesses

  • Requires ~40GB VRAM for comfortable local inference
  • 215 SOTA claim deserves skepticism — niche benchmarks inflate count
  • Voice cloning in real-world noisy environments not extensively validated
  • API pricing not fully finalized at launch (TBD status)
  • Multimodal architecture adds complexity for text-only use cases

Competitor Comparison

ModelArenaSWEGPQAPrice
Gemini 3.1 Pro~1480N/A~91Proprietary
GPT-Audio~1460N/A~89Proprietary
Qwen3.5-Omni-PlusN/AN/A~94.2 (MMLU)TBD
ElevenLabsN/AN/AN/AProprietary TTS
MinimaxN/AN/AN/AProprietary

Qwen3.5-Omni-Plus is the flagship variant of the Qwen3.5-Omni family, a natively omnimodal model with ~30B total parameters (~3B active) that processes text, images, audio, and video while generating both text and streaming speech in a single forward pass. Released March 30, 2026, it claims 215 SOTA results and delivers best-in-class speech recognition (113 languages, WER 1.11) with voice stability that surpasses ElevenLabs.

Analysis generated: 2026-05-24