PaddleOCR-VL-1.5
PaddleOCR-VL-1.5 is a multimodal large model developed by Baidu. It has a parameter scale of approximately 0.9 billion and is released under the Apache 2.0 license.
Parameters
0.9B
Context Window
License
Apache 2.0
Release Date
2026-01-29
API Pricing
API pricing for this model is not yet available
Strengths
- ・Lightweight 0.9B parameters
- ・Open usage license
- ・Efficient multimodal processing
Weaknesses
- ・Relatively small model scale
- ・Tendency towards specific applications
- ・Limited general reasoning ability
Use Cases
- ・Multimodal data analysis
- ・Integrating OCR functionality
- ・Visual processing on edge devices
Deep Analysis
Architecture
VLM (0.9B)
Ultra-compact document parsing model
Accuracy
94.5% on OmniDocBench v1.5
SOTA for document parsing
License
Open-source
Release Date
January 2026
Specialization
OCR + Document Parsing
Multi-task VLM
Key Features
Seal recognition, text spotting
New capabilities in v1.5
Strengths
- ・94.5% SOTA accuracy on OmniDocBench v1.5
- ・Ultra-compact 0.9B parameters
- ・Robust against real-world distortions (scanning, skew, warping)
- ・Seal recognition and text spotting
- ・Multilingual including Tibetan and Bengali
- ・Cross-page table merging
Weaknesses
- ・Specialized for document parsing only
- ・Not a general-purpose language model
- ・Limited to OCR-related tasks
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| PaddleOCR-VL (v1) | - | - | - | Free |
| Surya OCR | - | - | - | Free |
| Google Document AI | - | - | - | Paid |
PaddleOCR-VL-1.5 is Baidu's ultra-compact 0.9B VLM for document parsing, achieving 94.5% SOTA accuracy on OmniDocBench v1.5. It handles real-world distortions, seal recognition, text spotting, and multilingual OCR including Tibetan and Bengali.
Analysis generated: 2026-05-24