Back to Models
BaiduOpen Source

PaddleOCR-VL-1.5

PaddleOCR-VL-1.5 is a multimodal large model developed by Baidu. It has a parameter scale of approximately 0.9 billion and is released under the Apache 2.0 license.

Parameters

0.9B

Context Window

License

Apache 2.0

Release Date

2026-01-29

API Pricing

API pricing for this model is not yet available

Strengths

  • Lightweight 0.9B parameters
  • Open usage license
  • Efficient multimodal processing

Weaknesses

  • Relatively small model scale
  • Tendency towards specific applications
  • Limited general reasoning ability

Use Cases

  • Multimodal data analysis
  • Integrating OCR functionality
  • Visual processing on edge devices

Deep Analysis

Architecture

VLM (0.9B)

Ultra-compact document parsing model

Accuracy

94.5% on OmniDocBench v1.5

SOTA for document parsing

License

Open-source

Release Date

January 2026

Specialization

OCR + Document Parsing

Multi-task VLM

Key Features

Seal recognition, text spotting

New capabilities in v1.5

Strengths

  • 94.5% SOTA accuracy on OmniDocBench v1.5
  • Ultra-compact 0.9B parameters
  • Robust against real-world distortions (scanning, skew, warping)
  • Seal recognition and text spotting
  • Multilingual including Tibetan and Bengali
  • Cross-page table merging

Weaknesses

  • Specialized for document parsing only
  • Not a general-purpose language model
  • Limited to OCR-related tasks

Competitor Comparison

ModelArenaSWEGPQAPrice
PaddleOCR-VL (v1)---Free
Surya OCR---Free
Google Document AI---Paid

PaddleOCR-VL-1.5 is Baidu's ultra-compact 0.9B VLM for document parsing, achieving 94.5% SOTA accuracy on OmniDocBench v1.5. It handles real-world distortions, seal recognition, text spotting, and multilingual OCR including Tibetan and Bengali.

Analysis generated: 2026-05-24