What are the strengths of this model?

Powerful multimodal processing abilities Wide context window of 128K tokens Openness due to MIT license

What are the weaknesses of this model?

Load from vast parameters High necessity for computational resources Inference cost associated with model scale

What are the best use cases?

Advanced image and text analysis Understanding of long documents Multimodal AI development

Back to Models

Zhipu AIOpen Source

GLM-4.6V 106B-A12B

Name: GLM-4.6V 106B-A12B
Author: Zhipu AI

GLM-4.6V 106B-A12B is a multimodal foundation model developed by Zhipu AI. It is equipped with a 128K context window and is released under the MIT license.

Parameters

1080.0B

Context Window

128K

License

MIT

Release Date

2025-12-08

API Pricing

API pricing for this model is not yet available

Strengths

・Powerful multimodal processing abilities
・Wide context window of 128K tokens
・Openness due to MIT license

Weaknesses

・Load from vast parameters
・High necessity for computational resources
・Inference cost associated with model scale

Use Cases

・Advanced image and text analysis
・Understanding of long documents
・Multimodal AI development

Deep Analysis

Parameters

108B total / 12B active MoE

Context Window

128K tokens

AA Intelligence Index

Pricing

$0.30/$0.90 per 1M tokens

License

MIT

Native Function Call

Yes (first GLM VLM)

Release Date

December 2025

Strengths

・First GLM VLM with native multimodal function calling
・128K context for 150+ pages or 1hr video
・MIT license
・Competitive pricing ($0.30/$0.90)

Weaknesses

・Slow output (36.7 tok/s)
・Verbose output (90M tokens for eval)
・AA Index 23 moderate
・Superseded by GLM-5V Turbo

Competitor Comparison

Model	Arena
Gemini 3 Pro	~1449
Qwen3-VL-235B	N/A
GLM-4.5V	$0.60/$1.80

Overview

GLM-4.6V is a multimodal VLM with 108B/12B active params, 128K context, native multimodal function calling. MIT license.

Benchmarks & Performance

AA Index 23. Strong visual understanding. Slow at 36.7 tok/s.

Detailed Comparison

128K context (vs 32K), native function calling, better pricing ($0.30/$0.90 vs $0.60/$1.80). Cost-effective OSS VLM. Native function calling enables complex multimodal agents.

Community Feedback

Praised for multimodal function calling. Slow speed noted.

Use Cases

Document parsing, video analysis, GUI tasks, intelligent content creation.

Latest News

GLM-5V Turbo recommended as newer alternative.

GLM-4.6V is a multimodal VLM with 108B/12B active params, 128K context, native multimodal function calling. MIT license.

Sources

Z.ai Docs

Analysis generated: 2026-05-24