Back to Models
AlibabaProprietary

Qwen3.5-Omni-Flash

Qwen3.5-Omni-Flash is a multimodal large language model developed by Alibaba. It supports an extensive context window of 256K, enabling efficient processing.

Parameters

Undisclosed

Context Window

256K

License

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

Release Date

2026-03-30

API Pricing

API pricing for this model is not yet available

Strengths

  • Advanced multimodal processing
  • 256K long context
  • High-speed response performance

Weaknesses

  • Closed licensing system
  • Lack of detailed evaluation metrics
  • Limited track record as a new model

Use Cases

  • Large-scale document analysis
  • Multimodal information processing
  • Real-time response applications

Deep Analysis

Release Date

March 30, 2026

Architecture

Thinker-Talker, Hybrid-Attention MoE

Context Window

262,144 tokens

Max Audio Input

10+ hours continuous

Max Video Input

400+ seconds at 720p/1FPS

Speech Recognition

113 languages

Speech Generation

36 languages

Input Modalities

Text, Image, Audio, Video

Output Modalities

Text, Streaming Speech

API Price

~$0.065/1M text input, $0.260/1M output

Budget tier of Omni family

Strengths

  • Budget-friendly omnimodal model: text, image, audio, and video input with speech output
  • Natively end-to-end multimodal — no adapter or separate TTS pipeline needed
  • 113 languages for speech recognition, 36 for speech generation
  • Low latency for real-time voice chat applications
  • Apache 2.0 licensed, available for self-hosting via HuggingFace

Weaknesses

  • Lower quality than the Plus variant on audio and vision benchmarks
  • Benchmark scores trail Gemini 3.1 Pro on several audio understanding tasks
  • Limited documentation on specific parameter count and architecture details
  • Voice cloning quality may not match dedicated TTS solutions
  • Real-world performance in noisy environments not extensively tested

Competitor Comparison

ModelArenaSWEGPQAPrice
Qwen3.5-Omni-PlusN/AN/A~94.2 (MMLU)TBD
Gemini 3.1 Pro~1480N/A~91Proprietary
GPT-Audio~1460N/A~89Proprietary
Qwen3.5-Omni-FlashN/AN/A~92 (MMLU)$0.065/$0.260
ElevenLabsN/AN/AN/AProprietary TTS

Qwen3.5-Omni-Flash is the budget tier of the Qwen3.5-Omni family, released March 30, 2026. It is a natively omnimodal model that accepts text, images, audio, and video as input and produces both text and streaming speech output in a single forward pass. The Flash variant trades some benchmark quality for lower latency and cost, making it suitable for real-time voice chat and high-volume applications.

Analysis generated: 2026-05-24