이 모델의 강점은 무엇인가요?

고급 추론 능력 100만 토큰 장문 처리 3500억 대규모 파라미터

이 모델의 약점은 무엇인가요?

비공개 라이선스 높은 연산 자원 요구사항 상업적 사용 제한

어떤 용도에 가장 적합한가요?

복잡한 논리적 추론 초장문 문서 분석 고급 지식 추출

모델 목록으로

アリババ독점

Qwen3.5-35B-A3B

Name: Qwen3.5-35B-A3B
Author: アリババ

Qwen3.5-35B-A3B는 알리바바가 개발한 추론 모델입니다. 약 3500억 파라미터 규모를 자랑하며, 최대 100만 토큰의 광범위한 컨텍스트 윈도우를 지원합니다.

파라미터

350.0B

컨텍스트

라이선스

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

출시일

2026-02-25

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

・고급 추론 능력
・100만 토큰 장문 처리
・3500억 대규모 파라미터

약점

・비공개 라이선스
・높은 연산 자원 요구사항
・상업적 사용 제한

활용 사례

・복잡한 논리적 추론
・초장문 문서 분석
・고급 지식 추출

심층 분석

Release Date

February 2026

Total Parameters

35B

MoE with 256 experts

Active Parameters

3B per token

Only 3B active — ultra-efficient

Context Window

262,144 tokens

Architecture

Hybrid MoE: Gated DeltaNet + Gated Attention

Modalities

Text, Image, Video

Inference Speed

196 tok/s on RTX 4090

111 tok/s on RTX 3090 at Q4

VRAM (Q4)

~22 GB

License

Apache 2.0

AA Intelligence Index

More than double the class median of 15

강점

・Incredible speed: 196 tok/s on RTX 4090 with only 3B active parameters per token
・Beats previous-gen Qwen3-235B-A22B on core benchmarks despite being much smaller
・Fits on a single RTX 3090/4090 at Q4 quantization (~22GB VRAM)
・Community favorite: r/LocalLLaMA calls it 'the model that's all you need' for practical tasks
・Natively multimodal with text, image, and video support

약점

・Only 3B active parameters limits performance on the most complex reasoning tasks
・Creative writing quality may be inferior to the denser 27B model
・LiveCodeBench performance lags behind larger models
・MoE architecture still requires full 35B parameter weights in memory
・Successor Qwen3.6-35B-A3B already announced, making this slightly dated

경쟁사 비교

Model	Arena	SWE	GPQA	Price
Qwen3.5-27B	~1400	~68	85.5	Open-source
Qwen3.5-9B	~1370	~60	81.7	Open-source
Llama 4 Scout	~1380	~65	~80	Open-source
Qwen3.5-35B-A3B	~1390	~65	~83	Open-source
Mistral Large	~1380	~64	~78	Open-source

개요

Qwen3.5-35B-A3B is the speed champion of the Qwen3.5 family — a 35B MoE model that activates only 3B parameters per token, achieving 196 tok/s on an RTX 4090 at Q4 quantization. Despite its minimal active compute, it beats the previous-generation 235B-A22B model on core benchmarks. It is the community's recommended daily-driver model for local AI, fitting comfortably on a single consumer GPU.

벤치마크 및 성능

The 35B-A3B punches far above its weight class. The Artificial Analysis Intelligence Index rates it at 37 — more than double the median score of 15 for its class. On MMLU-Pro it scores in the ~82 range, GPQA Diamond ~83, with strong instruction following. The speed advantage is the headline: 196 tok/s on RTX 4090, 111 tok/s on RTX 3090 at Q4. On M4 Max via MLX, it hits 60-70 tok/s. These speeds make it viable for real-time interactive applications.

상세 비교

Compared to the 27B dense model: significantly faster (196 vs 35 tok/s on 4090) but slightly lower quality on creative writing and complex reasoning. Compared to the 9B: more capable on reasoning tasks with only marginally more VRAM. Compared to Llama 4 Scout and Mistral Large: competitive quality with dramatically better inference speed. The 3B active parameter design means inference cost is comparable to a 3B dense model.

커뮤니티 평가

Enthusiastic reception on r/LocalLLaMA and local AI communities. Widely recommended as the best model for consumer GPU deployment. Users praise the speed-to-quality ratio. The 'all you need' moniker reflects genuine satisfaction with real-world performance. Some users prefer the 27B for writing-heavy tasks. The announcement of Qwen3.6-35B-A3B successor has not diminished enthusiasm for the 3.5 version.

활용 사례

The ideal model for local AI enthusiasts and developers with 24GB GPUs. Excellent for coding assistance, batch processing, agent workflows, chat, summarization, and document analysis. The high speed makes it suitable for real-time applications and interactive development. For creative writing, the 27B dense model may be preferable. For teams, API access through DashScope and various providers eliminates hardware concerns.