모델 목록으로
アリババ독점

Qwen3.5-Omni-Plus

Qwen3.5-Omni-Plus는 Alibaba가 개발한 다중 모달 대형 언어 모델입니다. 256K의 방대한 컨텍스트 윈도우를 특징으로 하며, 고급 정보 처리 능력을 보유하고 있습니다.

파라미터

Undisclosed

컨텍스트

256K

라이선스

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

출시일

2026-03-30

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

  • 고급 다중 모달 능력
  • 256K 긴 컨텍스트 처리
  • 효율적인 기초 모델 설계

약점

  • 클로즈드 라이선스
  • 상업적 사용 제약
  • 접근 권한 제한

활용 사례

  • 대규모 문서 분석
  • 다중 모달 데이터 처리
  • 긴 컨텍스트 분석

심층 분석

Release Date

March 30, 2026

Total Parameters

~30B

MoE with ~3B active per token

Architecture

Thinker-Talker, Hybrid-Attention MoE

Context Window

262,144 tokens

Max Audio Input

10+ hours continuous

Max Video Input

400+ seconds at 720p/1FPS

Speech Recognition

113 languages

Speech Generation

36 languages

MMAU (audio)

82.2

vs Gemini 3.1 Pro's 81.1

LibriSpeech WER

1.11 (clean), 2.23 (other)

Cuts Gemini's error rate by ~2/3

강점

  • 215 SOTA results across audio, audio-video, visual, and text benchmarks
  • Best-in-class speech recognition: 113 languages, LibriSpeech WER 1.11 (2/3 lower than Gemini)
  • Native end-to-end multimodal: Thinker-Talker architecture jointly trained from scratch
  • Voice cloning from short samples with Seed-zh stability score 1.07 (beats ElevenLabs' 13.08)
  • Minimal text performance gap: MMLU-Redux 94.2 vs 94.3 for standard Qwen3.5-Plus

약점

  • Requires ~40GB VRAM for comfortable local inference
  • 215 SOTA claim deserves skepticism — niche benchmarks inflate count
  • Voice cloning in real-world noisy environments not extensively validated
  • API pricing not fully finalized at launch (TBD status)
  • Multimodal architecture adds complexity for text-only use cases

경쟁사 비교

ModelArenaSWEGPQAPrice
Gemini 3.1 Pro~1480N/A~91Proprietary
GPT-Audio~1460N/A~89Proprietary
Qwen3.5-Omni-PlusN/AN/A~94.2 (MMLU)TBD
ElevenLabsN/AN/AN/AProprietary TTS
MinimaxN/AN/AN/AProprietary

Qwen3.5-Omni-Plus is the flagship variant of the Qwen3.5-Omni family, a natively omnimodal model with ~30B total parameters (~3B active) that processes text, images, audio, and video while generating both text and streaming speech in a single forward pass. Released March 30, 2026, it claims 215 SOTA results and delivers best-in-class speech recognition (113 languages, WER 1.11) with voice stability that surpasses ElevenLabs.

분석 생성일: 2026-05-24