Qwen3.5-35B-A3B
Qwen3.5-35B-A3B is a reasoning model developed by Alibaba. It boasts a parameter scale of approximately 350B and supports an extensive context window of up to 1M.
Parameters
350.0B
Context Window
1M
License
https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE
Release Date
2026-02-25
API Pricing
API pricing for this model is not yet available
Strengths
- ・Advanced reasoning capabilities
- ・1M long-context processing
- ・Large scale 350B parameters
Weaknesses
- ・Closed license
- ・High computational resource requirements
- ・Commercial use restrictions
Use Cases
- ・Complex logical reasoning
- ・Ultra-long document analysis
- ・Advanced knowledge extraction
Deep Analysis
Release Date
February 2026
Total Parameters
35B
MoE with 256 experts
Active Parameters
3B per token
Only 3B active — ultra-efficient
Context Window
262,144 tokens
Architecture
Hybrid MoE: Gated DeltaNet + Gated Attention
Modalities
Text, Image, Video
Inference Speed
196 tok/s on RTX 4090
111 tok/s on RTX 3090 at Q4
VRAM (Q4)
~22 GB
License
Apache 2.0
AA Intelligence Index
37
More than double the class median of 15
Strengths
- ・Incredible speed: 196 tok/s on RTX 4090 with only 3B active parameters per token
- ・Beats previous-gen Qwen3-235B-A22B on core benchmarks despite being much smaller
- ・Fits on a single RTX 3090/4090 at Q4 quantization (~22GB VRAM)
- ・Community favorite: r/LocalLLaMA calls it 'the model that's all you need' for practical tasks
- ・Natively multimodal with text, image, and video support
Weaknesses
- ・Only 3B active parameters limits performance on the most complex reasoning tasks
- ・Creative writing quality may be inferior to the denser 27B model
- ・LiveCodeBench performance lags behind larger models
- ・MoE architecture still requires full 35B parameter weights in memory
- ・Successor Qwen3.6-35B-A3B already announced, making this slightly dated
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Qwen3.5-27B | ~1400 | ~68 | 85.5 | Open-source |
| Qwen3.5-9B | ~1370 | ~60 | 81.7 | Open-source |
| Llama 4 Scout | ~1380 | ~65 | ~80 | Open-source |
| Qwen3.5-35B-A3B | ~1390 | ~65 | ~83 | Open-source |
| Mistral Large | ~1380 | ~64 | ~78 | Open-source |
Qwen3.5-35B-A3B is the speed champion of the Qwen3.5 family — a 35B MoE model that activates only 3B parameters per token, achieving 196 tok/s on an RTX 4090 at Q4 quantization. Despite its minimal active compute, it beats the previous-generation 235B-A22B model on core benchmarks. It is the community's recommended daily-driver model for local AI, fitting comfortably on a single consumer GPU.
Sources
Analysis generated: 2026-05-24