What are the strengths of this model?

Design specialized for visual recognition Sufficient 9 billion scale parameters Available under open license

What are the weaknesses of this model?

Limited 8K context length Versatility unknown due to specialization Requires consistent memory consumption

What are the best use cases?

Advanced in-image text recognition Digitizing visual information Automating document analysis

Back to Models

Zhipu AIOpen Source

GLM-OCR

Name: GLM-OCR
Author: Zhipu AI

GLM-OCR is a visual large model developed by Zhipu AI. With approximately 9 billion parameters, it is an open multimodal model released under the Apache 2.0 license.

Parameters

9.0B

Context Window

License

Apache 2.0

Release Date

2026-02-03

API Pricing

API pricing for this model is not yet available

Strengths

・Design specialized for visual recognition
・Sufficient 9 billion scale parameters
・Available under open license

Weaknesses

・Limited 8K context length
・Versatility unknown due to specialization
・Requires consistent memory consumption

Use Cases

・Advanced in-image text recognition
・Digitizing visual information
・Automating document analysis

Deep Analysis

Parameters

0.9B (900M)

OmniDocBench v1.5

94.62

#1 overall

OCRBench

94.0

Throughput

1.86 pages/sec (PDF)

Pricing

~$0.03/1M tokens

Languages

8 languages

Release Date

March 2025

Strengths

・SOTA OmniDocBench v1.5 (94.62) with only 0.9B params
・1/10 cost of traditional OCR
・Multi-Token Prediction ~50% throughput boost
・8 languages, edge-deployable

Weaknesses

・Very specialized
・No video/interactive tasks
・KIE trails Gemini-3-Pro

Competitor Comparison

Model	Price
PaddleOCR-VL-1.5	N/A
MinerU2.5	N/A
DeepSeek-OCR	N/A

Overview

GLM-OCR is a 0.9B model achieving SOTA on OmniDocBench v1.5 (94.62), surpassing 235B models. ~$0.03/1M tokens, 8 languages.

Benchmarks & Performance

94.62 OmniDocBench (#1), 94.0 OCRBench, 96.5 UniMERNet. 50% throughput via Multi-Token Prediction.

Detailed Comparison

Matches PaddleOCR-VL-1.5 at similar size. 10x cheaper than traditional OCR.

Community Feedback

Positive for SOTA at 0.9B. Edge deployment capability highlighted.

Use Cases

Document OCR, table parsing, KIE, batch processing for RAG pipelines.

Latest News

Released March 2025. Available via vLLM, SGLang, Ollama.

GLM-OCR is a 0.9B model achieving SOTA on OmniDocBench v1.5 (94.62), surpassing 235B models. ~$0.03/1M tokens, 8 languages.

Sources

arXiv

Analysis generated: 2026-05-24