このモデルの強みは何ですか？

160Bの大規模なパラメータ MITライセンスによる開放性高度な視覚理解能力

このモデルの弱みは何ですか？

4Kという限定的な文脈長 35.8GBの大きなファイルサイズ高い計算リソースの必要性

どんな用途に最適ですか？

高度な画像解析と理解視覚情報の抽出と処理マルチモーダルAI開発

モデル一覧に戻る

Zhipu AIオープンソース

GLM-Image

Name: GLM-Image
Author: Zhipu AI

GLM-Imageは、智谱AIによって開発された視覚大模型です。約160Bのパラメータ規模を持ち、MITライセンスの下で公開されているオープンなマルチモーダルモデルです。

パラメータ

160.0B

コンテキスト長

ライセンス

MIT

リリース日

2026-01-14

API料金

このモデルのAPI料金情報は現在未公開です

強み

・160Bの大規模なパラメータ
・MITライセンスによる開放性
・高度な視覚理解能力

弱み

・4Kという限定的な文脈長
・35.8GBの大きなファイルサイズ
・高い計算リソースの必要性

活用例

・高度な画像解析と理解
・視覚情報の抽出と処理
・マルチモーダルAI開発

深度分析

Architecture

Autoregressive (9B) + Diffusion (7B)

CVTG-2K Word Accuracy

0.9116

#1 open-source

LongText-Bench EN

0.9524

#1 open-source

LongText-Bench CN

0.9788

#1 open-source

Price

$0.015 per image

License

Apache 2.0

Release Date

January 9, 2026

強み

・Open-source SOTA text rendering (#1 CVTG-2K, LongText-Bench)
・Hybrid architecture combines semantics + detail
・Excels at knowledge-intensive generation
・Very affordable ($0.015/image)

弱み

・General quality matches but doesn't surpass mainstream models
・Max 2048px resolution
・Smaller community (912 GitHub stars)

競合比較

Model	Price
DALL-E 3	$0.04-$0.08/image
Midjourney v6	Subscription
Stable Diffusion 3	Free (self-host)

概要

GLM-Imageは、ハイブリッド型の自己回帰+拡散画像生成モデルです。テキスト描画精度においてオープンソース第1位。$0.015/画像、Apache 2.0。

ベンチマーク＆性能

CVTG-2K 0.9116 word accuracy (#1 OSS), LongText-Bench 0.9524 EN / 0.9788 CN (#1 OSS).

詳細比較

Unique niche in text-heavy image generation. Cheaper than DALL-E 3 with better text accuracy.

コミュニティ評価

912 GitHub stars. Innovative hybrid architecture noted.

ユースケース

Posters, scientific diagrams, PPTs, knowledge-intensive image generation with accurate text.