Back to Models
DeepMindOpen Source

Gemma 4 E4B(有效4B端侧高性能模型)

Gemma 4 E4B is a multimodal foundation model developed by DeepMind. Despite having 8.0B parameters, it is designed for high-performance, efficient operation on devices.

Parameters

8.0B

Context Window

128K

License

Apache 2.0

Release Date

2026-04

API Pricing

API pricing for this model is not yet available

Strengths

  • High on-device performance
  • 128K long context window
  • Flexible Apache 2.0 license

Weaknesses

  • Moderate parameter count at 8.0B
  • Multimodal processing overhead
  • Needs optimization for specific use cases

Use Cases

  • AI inference on edge devices
  • Long-form document analysis
  • Multimodal information processing

Deep Analysis

Effective Parameters

4.5B

8.0B total with Per-Layer Embeddings

MMLU Pro

69.4%

vs Gemma 3 27B: 67.6%

LiveCodeBench v6

52.0%

vs Gemma 3 27B: 29.1%

GPQA Diamond

58.6%

vs Gemma 3 27B: 42.4%

Context Window

128K tokens

native long-context support

Generation Speed

~70 t/s

RTX 4070 Ti, Q8_0, full offload

VRAM (Q4)

~8 GB

runs on consumer GPUs and Apple Silicon

License

Apache 2.0

fully open, commercial use allowed

Strengths

  • Outperforms Gemma 3 27B (6x its size) on reasoning, coding, and science benchmarks despite being an edge-tier model
  • Native audio input support alongside text and vision—unique among Gemma 4 variants, enabling on-device voice/ASR use cases
  • Extremely efficient: ~70 t/s generation on a mid-range RTX 4070 Ti with graceful degradation up to 128K context

Weaknesses

  • Does not support thinking/chain-of-thought mode like E2B—less suited for complex multi-step reasoning tasks
  • Significantly trails larger Gem 4 siblings (31B: 89.2% AIME vs E4B: 42.5%) on hard math and coding benchmarks
  • Ollama integration quirks: E4B lacks thinking mode entirely, while E2B gets it by default, creating inconsistent developer experience across the family

Competitor Comparison

ModelArenaSWEGPQAPrice
Gemma 4 E2BN/AN/A43.4%Free (self-host)
Gemma 4 26B A4B1441N/A82.3%Free (self-host)
Gemma 4 31B1452N/A84.3%Free (self-host)
Gemma 3 27B1365N/A42.4%Free (self-host)

Gemma 4 E4B is Google DeepMind's high-performance edge model in the Gemma 4 family, released April 2, 2026. Despite having only 4.5B effective parameters (8.0B total with Per-Layer Embeddings), it punches far above its weight class—outperforming the previous-generation Gemma 3 27B on reasoning (GPQA Diamond: 58.6% vs 42.4%), coding (LiveCodeBench: 52.0% vs 29.1%), and math (AIME: 42.5% vs 20.8%) benchmarks. This represents a generational leap where a model one-sixth the size of its predecessor delivers superior results, making it a landmark achievement in efficient AI architecture.

The model is designed for deployment on consumer hardware including laptops, smartphones, and IoT devices. On an RTX 4070 Ti with full GPU offload, it achieves ~70 tokens/second generation with stable performance across context lengths up to 128K. Its Q4 quantized version requires only ~8GB of VRAM, and it runs comfortably on Apple Silicon unified memory. Uniquely within the Gemma 4 family, E4B supports native audio input (alongside text and vision), enabling on-device speech understanding—a capability absent from the larger 26B and 31B variants.

Positioned as the recommended starting point for edge and local deployments, E4B uses Apache 2.0 licensing for frictionless commercial adoption. While it lacks the thinking/CoT mode available on E2B, benchmarks show it delivers superior output quality on structured tasks (extraction, translation, commit messages) with faster effective throughput. For developers choosing within the Gemma 4 family, Google and community consensus is clear: start with E4B for edge/mobile use cases, step up to 26B A4B for workstation-grade reasoning, and only pursue 31B when maximum quality is non-negotiable.

Analysis generated: 2026-05-23