이 모델의 강점은 무엇인가요?

고급 다중 모달 능력 256K 긴 컨텍스트 처리 효율적인 기초 모델 설계

이 모델의 약점은 무엇인가요?

클로즈드 라이선스 상업적 사용 제약 접근 권한 제한

어떤 용도에 가장 적합한가요?

대규모 문서 분석 다중 모달 데이터 처리 긴 컨텍스트 분석

모델 목록으로

アリババ독점

Qwen3.5-Omni-Plus

Name: Qwen3.5-Omni-Plus
Author: アリババ

Qwen3.5-Omni-Plus는 Alibaba가 개발한 다중 모달 대형 언어 모델입니다. 256K의 방대한 컨텍스트 윈도우를 특징으로 하며, 고급 정보 처리 능력을 보유하고 있습니다.

파라미터

Undisclosed

컨텍스트

256K

라이선스

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

출시일

2026-03-30

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

・고급 다중 모달 능력
・256K 긴 컨텍스트 처리
・효율적인 기초 모델 설계

약점

・클로즈드 라이선스
・상업적 사용 제약
・접근 권한 제한

활용 사례

・대규모 문서 분석
・다중 모달 데이터 처리
・긴 컨텍스트 분석

심층 분석

Release Date

March 30, 2026

Total Parameters

~30B

MoE with ~3B active per token

Architecture

Thinker-Talker, Hybrid-Attention MoE

Context Window

262,144 tokens

Max Audio Input

10+ hours continuous

Max Video Input

400+ seconds at 720p/1FPS

Speech Recognition

113 languages

Speech Generation

36 languages

MMAU (audio)

82.2

vs Gemini 3.1 Pro's 81.1

LibriSpeech WER

1.11 (clean), 2.23 (other)

Cuts Gemini's error rate by ~2/3

강점

・215 SOTA results across audio, audio-video, visual, and text benchmarks
・Best-in-class speech recognition: 113 languages, LibriSpeech WER 1.11 (2/3 lower than Gemini)
・Native end-to-end multimodal: Thinker-Talker architecture jointly trained from scratch
・Voice cloning from short samples with Seed-zh stability score 1.07 (beats ElevenLabs' 13.08)
・Minimal text performance gap: MMLU-Redux 94.2 vs 94.3 for standard Qwen3.5-Plus

약점

・Requires ~40GB VRAM for comfortable local inference
・215 SOTA claim deserves skepticism — niche benchmarks inflate count
・Voice cloning in real-world noisy environments not extensively validated
・API pricing not fully finalized at launch (TBD status)
・Multimodal architecture adds complexity for text-only use cases

경쟁사 비교

Model	Arena	SWE	GPQA	Price
Gemini 3.1 Pro	~1480	N/A	~91	Proprietary
GPT-Audio	~1460	N/A	~89	Proprietary
Qwen3.5-Omni-Plus	N/A	N/A	~94.2 (MMLU)	TBD
ElevenLabs	N/A	N/A	N/A	Proprietary TTS
Minimax	N/A	N/A	N/A	Proprietary

개요

Qwen3.5-Omni-Plus is the flagship variant of the Qwen3.5-Omni family, a natively omnimodal model with ~30B total parameters (~3B active) that processes text, images, audio, and video while generating both text and streaming speech in a single forward pass. Released March 30, 2026, it claims 215 SOTA results and delivers best-in-class speech recognition (113 languages, WER 1.11) with voice stability that surpasses ElevenLabs.

벤치마크 및 성능

Headline benchmarks: MMAU (audio understanding) 82.2 vs Gemini 3.1 Pro's 81.1, VoiceBench 93.1 vs 88.9, LibriSpeech clean WER 1.11 vs 3.36, LibriSpeech other WER 2.23 vs 4.41. Text: MMLU-Redux 94.2, C-Eval 92.0. Visual: MMMU-Pro 73.9. Voice cloning: Seed-zh stability 1.07 vs ElevenLabs' 13.08 vs Gemini 2.5 Pro's 2.42. The text performance gap vs standard Qwen3.5-Plus is minimal (94.2 vs 94.3 on MMLU-Redux).

상세 비교

Cuts Gemini 3.1 Pro's speech recognition error rate by roughly two-thirds on both LibriSpeech test sets. Audio dialogue accuracy on VoiceBench runs 4 percentage points ahead. Voice cloning stability is an order of magnitude better than ElevenLabs. The text performance is essentially equivalent to the non-omni Qwen3.5-Plus, showing the multimodal architecture does not sacrifice text quality.

커뮤니티 평가

Generated significant excitement for its native multimodal approach — no adapter bolted onto a language model. The Audio-Visual Vibe Coding capability (point camera, describe UI, get code) captured developer imagination. Community notes the Thinker-Talker joint training as a genuine architectural innovation. Some healthy skepticism about the 215 SOTA count. The voice cloning quality vs ElevenLabs comparison was widely shared.

활용 사례

Ideal for building voice assistants, real-time translation systems, accessibility tools, audio/video content analysis, voice-cloned narration, and multimodal research. The 113-language speech recognition makes it uniquely valuable for multilingual applications. The streaming speech output enables natural conversational AI. For text-only workloads, the standard Qwen3.5-Plus is simpler and cheaper. For budget use cases, the Flash variant offers lower latency and cost.