Back to Models
AlibabaProprietary

Qwen3.5-35B-A3B

Qwen3.5-35B-A3B is a reasoning model developed by Alibaba. It boasts a parameter scale of approximately 350B and supports an extensive context window of up to 1M.

Parameters

350.0B

Context Window

1M

License

https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE

Release Date

2026-02-25

API Pricing

API pricing for this model is not yet available

Strengths

  • Advanced reasoning capabilities
  • 1M long-context processing
  • Large scale 350B parameters

Weaknesses

  • Closed license
  • High computational resource requirements
  • Commercial use restrictions

Use Cases

  • Complex logical reasoning
  • Ultra-long document analysis
  • Advanced knowledge extraction

Deep Analysis

Release Date

February 2026

Total Parameters

35B

MoE with 256 experts

Active Parameters

3B per token

Only 3B active — ultra-efficient

Context Window

262,144 tokens

Architecture

Hybrid MoE: Gated DeltaNet + Gated Attention

Modalities

Text, Image, Video

Inference Speed

196 tok/s on RTX 4090

111 tok/s on RTX 3090 at Q4

VRAM (Q4)

~22 GB

License

Apache 2.0

AA Intelligence Index

37

More than double the class median of 15

Strengths

  • Incredible speed: 196 tok/s on RTX 4090 with only 3B active parameters per token
  • Beats previous-gen Qwen3-235B-A22B on core benchmarks despite being much smaller
  • Fits on a single RTX 3090/4090 at Q4 quantization (~22GB VRAM)
  • Community favorite: r/LocalLLaMA calls it 'the model that's all you need' for practical tasks
  • Natively multimodal with text, image, and video support

Weaknesses

  • Only 3B active parameters limits performance on the most complex reasoning tasks
  • Creative writing quality may be inferior to the denser 27B model
  • LiveCodeBench performance lags behind larger models
  • MoE architecture still requires full 35B parameter weights in memory
  • Successor Qwen3.6-35B-A3B already announced, making this slightly dated

Competitor Comparison

ModelArenaSWEGPQAPrice
Qwen3.5-27B~1400~6885.5Open-source
Qwen3.5-9B~1370~6081.7Open-source
Llama 4 Scout~1380~65~80Open-source
Qwen3.5-35B-A3B~1390~65~83Open-source
Mistral Large~1380~64~78Open-source

Qwen3.5-35B-A3B is the speed champion of the Qwen3.5 family — a 35B MoE model that activates only 3B parameters per token, achieving 196 tok/s on an RTX 4090 at Q4 quantization. Despite its minimal active compute, it beats the previous-generation 235B-A22B model on core benchmarks. It is the community's recommended daily-driver model for local AI, fitting comfortably on a single consumer GPU.

Analysis generated: 2026-05-24