Back to Models
BaiduOpen Source

ERNIE-4.5-VL-28B-A3B-Thinking Multimodal Reasoning Model

The ERNIE-4.5-VL-28B-A3B-Thinking Multimodal Reasoning Model is a multimodal inference model developed by Baidu. It features a parameter scale of approximately 280B and a 131K context window, offering advanced reasoning capabilities.

Parameters

280.0B

Context Window

131K

License

Apache 2.0

Release Date

2025-11-11

API Pricing

API pricing for this model is not yet available

Strengths

  • Strong multimodal reasoning capability
  • Large 280B parameter scale
  • Long 131K context support

Weaknesses

  • High computational resource consumption due to size
  • Potential speed concerns as reasoning-specialized
  • May require optimization for specific tasks

Use Cases

  • Logical analysis of complex visual information
  • Multimodal analysis of long documents
  • Advanced visual problem-solving requiring reasoning

Deep Analysis

Architecture

MoE (28B total, 3B active)

Multimodal reasoning VLM

License

Apache 2.0

Open-source

Modalities

Text + Image

Vision-language model

Release Date

2025

Part of ERNIE 4.5 family

Framework

PaddlePaddle + Transformers

Strengths

  • Apache 2.0 open-source license
  • Lightweight 3B active parameters
  • Multimodal reasoning with thinking capability
  • Part of comprehensive ERNIE 4.5 family
  • Available on HuggingFace and AI Studio

Weaknesses

  • Small active parameter count limits complex tasks
  • Chinese model ecosystem
  • Limited benchmark data available

Competitor Comparison

ModelArenaSWEGPQAPrice
ERNIE 4.5 (larger)---Higher
Qwen-VL---Comparable
InternVL2---Comparable

ERNIE-4.5-VL-28B-Thinking is Baidu's open-source multimodal reasoning model with 28B total and 3B active parameters. Part of the ERNIE 4.5 family, it combines vision and language understanding with thinking/reasoning capabilities under Apache 2.0 license.

Analysis generated: 2026-05-24