모델 목록으로
アリババ독점

Qwen3.5-Omni-Flash

Qwen3.5-Omni-Flash는 Alibaba가 개발한 다중 모달 대형 언어 모델입니다. 256K의 광범위한 컨텍스트 윈도우를 지원하여 효율적인 처리를 가능하게 합니다.

파라미터

Undisclosed

컨텍스트

256K

라이선스

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

출시일

2026-03-30

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

  • 고급 다중 모달 처리
  • 256K 긴 컨텍스트
  • 고속 응답 성능

약점

  • 클로즈드 라이선스 시스템
  • 상세한 평가 메트릭 부재
  • 새 모델로서의 제한된 실적

활용 사례

  • 대규모 문서 분석
  • 다중 모달 정보 처리
  • 실시간 응용 프로그램

심층 분석

Release Date

March 30, 2026

Architecture

Thinker-Talker, Hybrid-Attention MoE

Context Window

262,144 tokens

Max Audio Input

10+ hours continuous

Max Video Input

400+ seconds at 720p/1FPS

Speech Recognition

113 languages

Speech Generation

36 languages

Input Modalities

Text, Image, Audio, Video

Output Modalities

Text, Streaming Speech

API Price

~$0.065/1M text input, $0.260/1M output

Budget tier of Omni family

강점

  • Budget-friendly omnimodal model: text, image, audio, and video input with speech output
  • Natively end-to-end multimodal — no adapter or separate TTS pipeline needed
  • 113 languages for speech recognition, 36 for speech generation
  • Low latency for real-time voice chat applications
  • Apache 2.0 licensed, available for self-hosting via HuggingFace

약점

  • Lower quality than the Plus variant on audio and vision benchmarks
  • Benchmark scores trail Gemini 3.1 Pro on several audio understanding tasks
  • Limited documentation on specific parameter count and architecture details
  • Voice cloning quality may not match dedicated TTS solutions
  • Real-world performance in noisy environments not extensively tested

경쟁사 비교

ModelArenaSWEGPQAPrice
Qwen3.5-Omni-PlusN/AN/A~94.2 (MMLU)TBD
Gemini 3.1 Pro~1480N/A~91Proprietary
GPT-Audio~1460N/A~89Proprietary
Qwen3.5-Omni-FlashN/AN/A~92 (MMLU)$0.065/$0.260
ElevenLabsN/AN/AN/AProprietary TTS

Qwen3.5-Omni-Flash is the budget tier of the Qwen3.5-Omni family, released March 30, 2026. It is a natively omnimodal model that accepts text, images, audio, and video as input and produces both text and streaming speech output in a single forward pass. The Flash variant trades some benchmark quality for lower latency and cost, making it suitable for real-time voice chat and high-volume applications.

분석 생성일: 2026-05-24