Qwen3.5-Omni-Light
Qwen3.5-Omni-Light는 Alibaba가 개발한 다중 모달 기초 모델입니다. 256K의 긴 컨텍스트 윈도우를 지원하여 고급 다중 모달 처리를 가능하게 합니다.
파라미터
Undisclosed
컨텍스트
256K
라이선스
https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE
출시일
2026-03-30
API 가격
이 모델의 API 가격 정보는 현재 공개되지 않았습니다
강점
- ・광범위한 다중 모달 지원
- ・256K 긴 컨텍스트 처리 능력
- ・Alibaba의 최신 설계
약점
- ・클로즈드 소스 라이선스 형식
- ・상세한 성능 메트릭 부재
- ・상업적 사용에 대한 잠재적 제한
활용 사례
- ・긴 문서 분석
- ・다중 모달 데이터 처리
- ・고급 컨텍스트 이해
심층 분석
Arena Elo
N/A
Not officially benchmarked on LMArena for this specific variant
MMAU (Audio Understanding)
82.2
SOTA; outperforms Gemini 3.1 Pro (81.1)
LibriSpeech Clean WER
1.11%
SOTA; ~3x lower error than Gemini 3.1 Pro (3.36%)
Input Price (Flash)
$0.10/1M
Budget tier; ~20x cheaper than GPT-5.2
Context Window
256K tokens
Supports 10+ hours of audio or 400s of 720p video
Speech Recognition Languages
113
Massive jump from 19 in previous generation
강점
- ・State-of-the-art audio understanding and generation, beating Gemini 3.1 Pro on key benchmarks.
- ・Exceptional multilingual support with 113 languages for speech recognition and 36 for synthesis.
- ・Cost-effective API pricing, significantly undercutting major Western competitors.
- ・Unique 'Audio-Visual Vibe Coding' enables code generation from spoken instructions and visual context.
- ・Advanced real-time interaction with semantic interruption and voice cloning capabilities.
약점
- ・Only the 'Light' variant has open weights; 'Plus' and 'Flash' are proprietary API-only models.
- ・The '215 SOTA results' claim includes many niche, per-language subtasks; broad independent verification is pending.
- ・High computational cost for processing long video/audio can lead to unpredictable API expenses.
- ・Audio generation quality is optimized for English and Mandarin; other languages can be less natural.
- ・Data processing occurs in Chinese data centers, raising potential latency and data sovereignty concerns for some users.
경쟁사 비교
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Gemini 3.1 Pro | N/A | N/A | N/A | $2.00-$4.00/$12.00-$18.00 per 1M tokens (estimated) |
| GPT-4o / GPT-Audio | N/A | N/A | N/A | $2.50/$10.00 per 1M tokens (GPT-4o text-only) |
| ElevenLabs (Multilingual v2) | N/A | N/A | N/A | Not directly comparable; specialized voice API |
Qwen3.5-Omni-Light is the lightweight variant within Alibaba's Qwen3.5-Omni family, released on March 30, 2026. It represents a significant leap in native omnimodal AI, designed to process text, images, audio, and video in a single model pass and generate both text and real-time speech. The architecture, based on a Thinker-Talker framework with Hybrid-Attention MoE, is optimized for efficiency, making the 'Light' version suitable for edge and resource-constrained deployments. While specific parameter counts for the Light variant are undisclosed, it shares the family's core capabilities, including a massive 256K token context window and support for 113 languages in speech recognition.
Positioned as the most accessible entry point in the series, Qwen3.5-Omni-Light is available as open weights, allowing for local deployment and fine-tuning under a Qwen License (free commercial use). This contrasts with the flagship 'Plus' and balanced 'Flash' variants, which are proprietary and accessed via API. The model's primary innovation is its ability to handle long-form audio (10+ hours) and video (400+ seconds of 720p) natively, a feature that unlocks applications like full-podcast analysis, meeting transcription with visual context, and real-time multilingual voice agents. Benchmark claims from the Plus variant (215 SOTA results) position the family as a leader in audio and audio-visual tasks, though the Light variant's specific performance tier is less documented.
출처
- Qwen3.5-Omni Technical Report (arXiv)
- Qwen3.5-Omni Official Announcement & Blog
- DataLearnerAI - Model Overview and Benchmarks
- StableLearn - Detailed Capability Breakdown
- BuildFastWithAI - Independent Review & Analysis
- Apidog - Review and API Access Guide
- ComputerTech - Comprehensive Review with Benchmarks
- 36氪 (36Kr) - Chinese Launch Coverage & Hands-on Test
분석 생성일: 2026-05-23