Qwen3.5-35B-A3B
Qwen3.5-35B-A3B는 알리바바가 개발한 추론 모델입니다. 약 3500억 파라미터 규모를 자랑하며, 최대 100만 토큰의 광범위한 컨텍스트 윈도우를 지원합니다.
파라미터
350.0B
컨텍스트
1M
라이선스
https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE
출시일
2026-02-25
API 가격
이 모델의 API 가격 정보는 현재 공개되지 않았습니다
강점
- ・고급 추론 능력
- ・100만 토큰 장문 처리
- ・3500억 대규모 파라미터
약점
- ・비공개 라이선스
- ・높은 연산 자원 요구사항
- ・상업적 사용 제한
활용 사례
- ・복잡한 논리적 추론
- ・초장문 문서 분석
- ・고급 지식 추출
심층 분석
Release Date
February 2026
Total Parameters
35B
MoE with 256 experts
Active Parameters
3B per token
Only 3B active — ultra-efficient
Context Window
262,144 tokens
Architecture
Hybrid MoE: Gated DeltaNet + Gated Attention
Modalities
Text, Image, Video
Inference Speed
196 tok/s on RTX 4090
111 tok/s on RTX 3090 at Q4
VRAM (Q4)
~22 GB
License
Apache 2.0
AA Intelligence Index
37
More than double the class median of 15
강점
- ・Incredible speed: 196 tok/s on RTX 4090 with only 3B active parameters per token
- ・Beats previous-gen Qwen3-235B-A22B on core benchmarks despite being much smaller
- ・Fits on a single RTX 3090/4090 at Q4 quantization (~22GB VRAM)
- ・Community favorite: r/LocalLLaMA calls it 'the model that's all you need' for practical tasks
- ・Natively multimodal with text, image, and video support
약점
- ・Only 3B active parameters limits performance on the most complex reasoning tasks
- ・Creative writing quality may be inferior to the denser 27B model
- ・LiveCodeBench performance lags behind larger models
- ・MoE architecture still requires full 35B parameter weights in memory
- ・Successor Qwen3.6-35B-A3B already announced, making this slightly dated
경쟁사 비교
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Qwen3.5-27B | ~1400 | ~68 | 85.5 | Open-source |
| Qwen3.5-9B | ~1370 | ~60 | 81.7 | Open-source |
| Llama 4 Scout | ~1380 | ~65 | ~80 | Open-source |
| Qwen3.5-35B-A3B | ~1390 | ~65 | ~83 | Open-source |
| Mistral Large | ~1380 | ~64 | ~78 | Open-source |
Qwen3.5-35B-A3B is the speed champion of the Qwen3.5 family — a 35B MoE model that activates only 3B parameters per token, achieving 196 tok/s on an RTX 4090 at Q4 quantization. Despite its minimal active compute, it beats the previous-generation 235B-A22B model on core benchmarks. It is the community's recommended daily-driver model for local AI, fitting comfortably on a single consumer GPU.
출처
분석 생성일: 2026-05-24