GLM-ASR-2512
GLM-ASR-2512 is a speech large model developed by Zhipu AI. It is provided as a closed-source model with advanced speech processing capabilities.
Parameters
Undisclosed
Context Window
License
Proprietary
Release Date
2025-12-10
API Pricing
API pricing for this model is not yet available
Strengths
- ・Cutting-edge audio processing capabilities
- ・Advanced design by Zhipu AI
- ・Latest model architecture
Weaknesses
- ・Non-open-source license
- ・Opaque internal structure
- ・Potential usage restrictions
Use Cases
- ・Advanced speech recognition tasks
- ・Analysis and processing of audio data
- ・Development of next-generation audio AI
Deep Analysis
Model Type
Automatic Speech Recognition (ASR)
Parameters
1.5B (Nano variant)
CER
0.0717 (industry-leading)
Languages
17 (WER ≤ 20%)
Audio Duration Limit
≤ 30 seconds
File Size Limit
≤ 25 MB
Strengths
- ・Industry-leading CER of 0.0717
- ・Exceptional dialect support including Cantonese
- ・Low-volume speech robustness (whisper/quiet speech)
- ・Outperforms OpenAI Whisper V3 on multiple benchmarks
- ・Efficient custom dictionary for specialized terminology
Weaknesses
- ・30-second audio duration limit per request
- ・25 MB file size limit
- ・Primarily optimized for Chinese/English markets
- ・Closed-source API (Nano variant is open-source)
- ・May require multiple requests for long audio files
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| OpenAI Whisper V3 Large | N/A | N/A | N/A | $0.006/min |
| Google Cloud Speech-to-Text V2 | N/A | N/A | N/A | $0.016/min |
| Azure Speech to Text | N/A | N/A | N/A | $1/hour |
| AssemblyAI Universal-2 | N/A | N/A | N/A | $0.015/min |
GLM-ASR-2512 is Zhipu AI's next-generation speech recognition model achieving a character error rate of 0.0717, reaching internationally leading standards. It excels at Chinese, English, and Cantonese recognition with robust performance in noisy environments and low-volume speech scenarios. The API version supports real-time transcription for meetings, customer service, and document input.
Sources
Analysis generated: 2026-05-24