Qwen3.5-Omni-Flash
Qwen3.5-Omni-Flash is a multimodal large language model developed by Alibaba. It supports an extensive context window of 256K, enabling efficient processing.
Parameters
Undisclosed
Context Window
256K
License
https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE
Release Date
2026-03-30
API Pricing
API pricing for this model is not yet available
Strengths
- ・Advanced multimodal processing
- ・256K long context
- ・High-speed response performance
Weaknesses
- ・Closed licensing system
- ・Lack of detailed evaluation metrics
- ・Limited track record as a new model
Use Cases
- ・Large-scale document analysis
- ・Multimodal information processing
- ・Real-time response applications
Deep Analysis
Release Date
March 30, 2026
Architecture
Thinker-Talker, Hybrid-Attention MoE
Context Window
262,144 tokens
Max Audio Input
10+ hours continuous
Max Video Input
400+ seconds at 720p/1FPS
Speech Recognition
113 languages
Speech Generation
36 languages
Input Modalities
Text, Image, Audio, Video
Output Modalities
Text, Streaming Speech
API Price
~$0.065/1M text input, $0.260/1M output
Budget tier of Omni family
Strengths
- ・Budget-friendly omnimodal model: text, image, audio, and video input with speech output
- ・Natively end-to-end multimodal — no adapter or separate TTS pipeline needed
- ・113 languages for speech recognition, 36 for speech generation
- ・Low latency for real-time voice chat applications
- ・Apache 2.0 licensed, available for self-hosting via HuggingFace
Weaknesses
- ・Lower quality than the Plus variant on audio and vision benchmarks
- ・Benchmark scores trail Gemini 3.1 Pro on several audio understanding tasks
- ・Limited documentation on specific parameter count and architecture details
- ・Voice cloning quality may not match dedicated TTS solutions
- ・Real-world performance in noisy environments not extensively tested
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Qwen3.5-Omni-Plus | N/A | N/A | ~94.2 (MMLU) | TBD |
| Gemini 3.1 Pro | ~1480 | N/A | ~91 | Proprietary |
| GPT-Audio | ~1460 | N/A | ~89 | Proprietary |
| Qwen3.5-Omni-Flash | N/A | N/A | ~92 (MMLU) | $0.065/$0.260 |
| ElevenLabs | N/A | N/A | N/A | Proprietary TTS |
Qwen3.5-Omni-Flash is the budget tier of the Qwen3.5-Omni family, released March 30, 2026. It is a natively omnimodal model that accepts text, images, audio, and video as input and produces both text and streaming speech output in a single forward pass. The Flash variant trades some benchmark quality for lower latency and cost, making it suitable for real-time voice chat and high-volume applications.
Sources
Analysis generated: 2026-05-24