What are the strengths of this model?

Lightweight 0.9B parameters Open usage license Efficient multimodal processing

What are the weaknesses of this model?

Relatively small model scale Tendency towards specific applications Limited general reasoning ability

What are the best use cases?

Multimodal data analysis Integrating OCR functionality Visual processing on edge devices

Back to Models

BaiduOpen Source

PaddleOCR-VL-1.5

Name: PaddleOCR-VL-1.5
Author: Baidu

PaddleOCR-VL-1.5 is a multimodal large model developed by Baidu. It has a parameter scale of approximately 0.9 billion and is released under the Apache 2.0 license.

Parameters

0.9B

Context Window

License

Apache 2.0

Release Date

2026-01-29

API Pricing

API pricing for this model is not yet available

Strengths

・Lightweight 0.9B parameters
・Open usage license
・Efficient multimodal processing

Weaknesses

・Relatively small model scale
・Tendency towards specific applications
・Limited general reasoning ability

Use Cases

・Multimodal data analysis
・Integrating OCR functionality
・Visual processing on edge devices

Deep Analysis

Architecture

VLM (0.9B)

Ultra-compact document parsing model

Accuracy

94.5% on OmniDocBench v1.5

SOTA for document parsing

License

Open-source

Release Date

January 2026

Specialization

OCR + Document Parsing

Multi-task VLM

Key Features

Seal recognition, text spotting

New capabilities in v1.5

Strengths

・94.5% SOTA accuracy on OmniDocBench v1.5
・Ultra-compact 0.9B parameters
・Robust against real-world distortions (scanning, skew, warping)
・Seal recognition and text spotting
・Multilingual including Tibetan and Bengali
・Cross-page table merging

Weaknesses

・Specialized for document parsing only
・Not a general-purpose language model
・Limited to OCR-related tasks

Competitor Comparison

Model	Arena	SWE	GPQA	Price
PaddleOCR-VL (v1)	-	-	-	Free
Surya OCR	-	-	-	Free
Google Document AI	-	-	-	Paid

Overview

PaddleOCR-VL-1.5 is Baidu's ultra-compact 0.9B VLM for document parsing, achieving 94.5% SOTA accuracy on OmniDocBench v1.5. It handles real-world distortions, seal recognition, text spotting, and multilingual OCR including Tibetan and Bengali.

Benchmarks & Performance

SOTA on OmniDocBench v1.5 and Real5-OmniDocBench. Superior to mainstream open-source and proprietary models on scanning, skew, warping, screen-photography, and illumination scenarios.

Detailed Comparison

Significantly more compact than alternatives while achieving SOTA accuracy. Best-in-class for real-world document parsing robustness.

Community Feedback

Part of the PaddleOCR ecosystem. Available on HuggingFace and Baidu AI Studio.

Use Cases

Document digitization, OCR pipelines, invoice/receipt processing, multilingual document understanding, and seal/stamp recognition.

Latest News

Released January 2026 as an upgrade to PaddleOCR-VL. Introduces SOTA accuracy on new Real5-OmniDocBench benchmark.

Sources

Analysis generated: 2026-05-24