Qwen3.5-Omni-Plus
Qwen3.5-Omni-Plus는 Alibaba가 개발한 다중 모달 대형 언어 모델입니다. 256K의 방대한 컨텍스트 윈도우를 특징으로 하며, 고급 정보 처리 능력을 보유하고 있습니다.
파라미터
Undisclosed
컨텍스트
256K
라이선스
https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE
출시일
2026-03-30
API 가격
이 모델의 API 가격 정보는 현재 공개되지 않았습니다
강점
- ・고급 다중 모달 능력
- ・256K 긴 컨텍스트 처리
- ・효율적인 기초 모델 설계
약점
- ・클로즈드 라이선스
- ・상업적 사용 제약
- ・접근 권한 제한
활용 사례
- ・대규모 문서 분석
- ・다중 모달 데이터 처리
- ・긴 컨텍스트 분석
심층 분석
Release Date
March 30, 2026
Total Parameters
~30B
MoE with ~3B active per token
Architecture
Thinker-Talker, Hybrid-Attention MoE
Context Window
262,144 tokens
Max Audio Input
10+ hours continuous
Max Video Input
400+ seconds at 720p/1FPS
Speech Recognition
113 languages
Speech Generation
36 languages
MMAU (audio)
82.2
vs Gemini 3.1 Pro's 81.1
LibriSpeech WER
1.11 (clean), 2.23 (other)
Cuts Gemini's error rate by ~2/3
강점
- ・215 SOTA results across audio, audio-video, visual, and text benchmarks
- ・Best-in-class speech recognition: 113 languages, LibriSpeech WER 1.11 (2/3 lower than Gemini)
- ・Native end-to-end multimodal: Thinker-Talker architecture jointly trained from scratch
- ・Voice cloning from short samples with Seed-zh stability score 1.07 (beats ElevenLabs' 13.08)
- ・Minimal text performance gap: MMLU-Redux 94.2 vs 94.3 for standard Qwen3.5-Plus
약점
- ・Requires ~40GB VRAM for comfortable local inference
- ・215 SOTA claim deserves skepticism — niche benchmarks inflate count
- ・Voice cloning in real-world noisy environments not extensively validated
- ・API pricing not fully finalized at launch (TBD status)
- ・Multimodal architecture adds complexity for text-only use cases
경쟁사 비교
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Gemini 3.1 Pro | ~1480 | N/A | ~91 | Proprietary |
| GPT-Audio | ~1460 | N/A | ~89 | Proprietary |
| Qwen3.5-Omni-Plus | N/A | N/A | ~94.2 (MMLU) | TBD |
| ElevenLabs | N/A | N/A | N/A | Proprietary TTS |
| Minimax | N/A | N/A | N/A | Proprietary |
Qwen3.5-Omni-Plus is the flagship variant of the Qwen3.5-Omni family, a natively omnimodal model with ~30B total parameters (~3B active) that processes text, images, audio, and video while generating both text and streaming speech in a single forward pass. Released March 30, 2026, it claims 215 SOTA results and delivers best-in-class speech recognition (113 languages, WER 1.11) with voice stability that surpasses ElevenLabs.
출처
분석 생성일: 2026-05-24