GLM-ASR-Nano-2512
GLM-ASR-Nano-2512 is a speech large model developed by Zhipu AI. It has approximately 20.0B parameters and is released under the Apache 2.0 license.
Parameters
20.0B
Context Window
License
Apache 2.0
Release Date
2025-12-10
API Pricing
API pricing for this model is not yet available
Strengths
- ・Sufficient parameter scale of 20.0B
- ・Open usage under Apache 2.0 license
- ・Efficient model file size
Weaknesses
- ・Details of specialized functions are unknown
- ・Lack of specific operational cost metrics
- ・No information on multilingual support range
Use Cases
- ・Building advanced speech recognition systems
- ・Text conversion processing of audio data
- ・Open-source audio AI development
Deep Analysis
Model Type
Automatic Speech Recognition (ASR)
Parameters
1.5B
Average Error Rate
4.10 (lowest among comparable models)
Languages
17 (WER ≤ 20%)
License
Apache 2.0
GitHub Stars
806
Strengths
- ・Open-source with Apache 2.0 license
- ・Compact 1.5B parameter model suitable for edge deployment
- ・Outperforms Whisper V3 on Chinese benchmarks
- ・Exceptional Cantonese and dialect recognition
- ・Low-volume speech robustness for quiet environments
Weaknesses
- ・1.5B parameters still require significant compute for edge devices
- ・Primarily optimized for Chinese language family
- ・English performance may lag behind specialized English models
- ・Requires transformers 5.0.0 from source for best results
- ・Model weight format changed after December 27, 2025
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| OpenAI Whisper V3 Large | N/A | N/A | N/A | Open source |
| Whisper V3 Small | N/A | N/A | N/A | Open source |
| Moonshine ASR | N/A | N/A | N/A | Open source |
| NVIDIA Canary 1B | N/A | N/A | N/A | Open source |
GLM-ASR-Nano-2512 is Zhipu AI's open-source speech recognition model with 1.5B parameters, achieving the lowest average error rate (4.10) among comparable open-source models. Released under Apache 2.0, it excels at Chinese, English, and Cantonese recognition with unique low-volume speech robustness. Available on Hugging Face and ModelScope.
Sources
Analysis generated: 2026-05-24