모델 목록으로
アリババ독점

Qwen3.5-35B-A3B

Qwen3.5-35B-A3B는 알리바바가 개발한 추론 모델입니다. 약 3500억 파라미터 규모를 자랑하며, 최대 100만 토큰의 광범위한 컨텍스트 윈도우를 지원합니다.

파라미터

350.0B

컨텍스트

1M

라이선스

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

출시일

2026-02-25

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

  • 고급 추론 능력
  • 100만 토큰 장문 처리
  • 3500억 대규모 파라미터

약점

  • 비공개 라이선스
  • 높은 연산 자원 요구사항
  • 상업적 사용 제한

활용 사례

  • 복잡한 논리적 추론
  • 초장문 문서 분석
  • 고급 지식 추출

심층 분석

Release Date

February 2026

Total Parameters

35B

MoE with 256 experts

Active Parameters

3B per token

Only 3B active — ultra-efficient

Context Window

262,144 tokens

Architecture

Hybrid MoE: Gated DeltaNet + Gated Attention

Modalities

Text, Image, Video

Inference Speed

196 tok/s on RTX 4090

111 tok/s on RTX 3090 at Q4

VRAM (Q4)

~22 GB

License

Apache 2.0

AA Intelligence Index

37

More than double the class median of 15

강점

  • Incredible speed: 196 tok/s on RTX 4090 with only 3B active parameters per token
  • Beats previous-gen Qwen3-235B-A22B on core benchmarks despite being much smaller
  • Fits on a single RTX 3090/4090 at Q4 quantization (~22GB VRAM)
  • Community favorite: r/LocalLLaMA calls it 'the model that's all you need' for practical tasks
  • Natively multimodal with text, image, and video support

약점

  • Only 3B active parameters limits performance on the most complex reasoning tasks
  • Creative writing quality may be inferior to the denser 27B model
  • LiveCodeBench performance lags behind larger models
  • MoE architecture still requires full 35B parameter weights in memory
  • Successor Qwen3.6-35B-A3B already announced, making this slightly dated

경쟁사 비교

ModelArenaSWEGPQAPrice
Qwen3.5-27B~1400~6885.5Open-source
Qwen3.5-9B~1370~6081.7Open-source
Llama 4 Scout~1380~65~80Open-source
Qwen3.5-35B-A3B~1390~65~83Open-source
Mistral Large~1380~64~78Open-source

Qwen3.5-35B-A3B is the speed champion of the Qwen3.5 family — a 35B MoE model that activates only 3B parameters per token, achieving 196 tok/s on an RTX 4090 at Q4 quantization. Despite its minimal active compute, it beats the previous-generation 235B-A22B model on core benchmarks. It is the community's recommended daily-driver model for local AI, fitting comfortably on a single consumer GPU.

분석 생성일: 2026-05-24