모델 목록으로
DeepMind오픈소스

Gemma 4 E4B(有效4B端侧高性能模型)

Gemma 4 E4B는 DeepMind가 개발한 멀티모달 기반 모델입니다. 8.0B 파라미터를 가지고 있음에도 불구하고, 기기 상에서 고성능 효율적 작동을 위해 설계되었습니다.

파라미터

8.0B

컨텍스트

128K

라이선스

Apache 2.0

출시일

2026-04

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

  • 높은 온디바이스 성능
  • 128K 긴 컨텍스트 윈도우
  • 유연한 Apache 2.0 라이선스

약점

  • 8.0B의 적당한 파라미터 수
  • 멀티모달 처리 오버헤드
  • 특정 사용 사례에 대한 최적화 필요

활용 사례

  • 엣지 디바이스에서의 AI 추론
  • 장문 문서 분석
  • 멀티모달 정보 처리

심층 분석

Effective Parameters

4.5B

8.0B total with Per-Layer Embeddings

MMLU Pro

69.4%

vs Gemma 3 27B: 67.6%

LiveCodeBench v6

52.0%

vs Gemma 3 27B: 29.1%

GPQA Diamond

58.6%

vs Gemma 3 27B: 42.4%

Context Window

128K tokens

native long-context support

Generation Speed

~70 t/s

RTX 4070 Ti, Q8_0, full offload

VRAM (Q4)

~8 GB

runs on consumer GPUs and Apple Silicon

License

Apache 2.0

fully open, commercial use allowed

강점

  • Outperforms Gemma 3 27B (6x its size) on reasoning, coding, and science benchmarks despite being an edge-tier model
  • Native audio input support alongside text and vision—unique among Gemma 4 variants, enabling on-device voice/ASR use cases
  • Extremely efficient: ~70 t/s generation on a mid-range RTX 4070 Ti with graceful degradation up to 128K context

약점

  • Does not support thinking/chain-of-thought mode like E2B—less suited for complex multi-step reasoning tasks
  • Significantly trails larger Gem 4 siblings (31B: 89.2% AIME vs E4B: 42.5%) on hard math and coding benchmarks
  • Ollama integration quirks: E4B lacks thinking mode entirely, while E2B gets it by default, creating inconsistent developer experience across the family

경쟁사 비교

ModelArenaSWEGPQAPrice
Gemma 4 E2BN/AN/A43.4%Free (self-host)
Gemma 4 26B A4B1441N/A82.3%Free (self-host)
Gemma 4 31B1452N/A84.3%Free (self-host)
Gemma 3 27B1365N/A42.4%Free (self-host)

Gemma 4 E4B is Google DeepMind's high-performance edge model in the Gemma 4 family, released April 2, 2026. Despite having only 4.5B effective parameters (8.0B total with Per-Layer Embeddings), it punches far above its weight class—outperforming the previous-generation Gemma 3 27B on reasoning (GPQA Diamond: 58.6% vs 42.4%), coding (LiveCodeBench: 52.0% vs 29.1%), and math (AIME: 42.5% vs 20.8%) benchmarks. This represents a generational leap where a model one-sixth the size of its predecessor delivers superior results, making it a landmark achievement in efficient AI architecture.

The model is designed for deployment on consumer hardware including laptops, smartphones, and IoT devices. On an RTX 4070 Ti with full GPU offload, it achieves ~70 tokens/second generation with stable performance across context lengths up to 128K. Its Q4 quantized version requires only ~8GB of VRAM, and it runs comfortably on Apple Silicon unified memory. Uniquely within the Gemma 4 family, E4B supports native audio input (alongside text and vision), enabling on-device speech understanding—a capability absent from the larger 26B and 31B variants.

Positioned as the recommended starting point for edge and local deployments, E4B uses Apache 2.0 licensing for frictionless commercial adoption. While it lacks the thinking/CoT mode available on E2B, benchmarks show it delivers superior output quality on structured tasks (extraction, translation, commit messages) with faster effective throughput. For developers choosing within the Gemma 4 family, Google and community consensus is clear: start with E4B for edge/mobile use cases, step up to 26B A4B for workstation-grade reasoning, and only pursue 31B when maximum quality is non-negotiable.

분석 생성일: 2026-05-23