Gemma 4 E2B(有效2B端侧模型)
Gemma 4 E2B는 DeepMind가 개발한 다중 모달 기초 모델입니다. 약 5.1B 파라미터를 갖추고 있으며, 엣지 디바이스에서 효율적인 작동을 위해 설계되었습니다.
파라미터
5.1B
컨텍스트
128K
라이선스
Apache 2.0
출시일
2026-04
API 가격
이 모델의 API 가격 정보는 현재 공개되지 않았습니다
강점
- ・강력한 다중 모달 지원
- ・128K 긴 컨텍스트 윈도우
- ・엣지 디바이스에서의 효율적인 작동
약점
- ・대형 모델에 비해 작은 지식 기반
- ・복잡한 추론의 한계
- ・가용 컴퓨팅 자원에 대한 의존성
활용 사례
- ・실시간 온디바이스 처리
- ・다중 모달 데이터 분석
- ・긴 컨텍스트 처리
심층 분석
Parameters
2.1B (effective) / 5.1B with embeddings
Smallest model in the Gemma 4 family
Context Window
128K tokens
Per Google/HuggingFace docs (gemma4.dev reports 8K for text-only mode)
Architecture
Dense transformer
With Per-Layer Embeddings (PLE) and shared KV cache
Min VRAM (BF16)
5 GB
Or 2GB with Q4 quantization
Multimodal
Image + Audio input
Supports vision and audio unlike what some sources claim
Release Date
April 2, 2026
Part of Gemma 4 family launch
License
Apache 2.0
First Gemma with Apache 2.0
Tool Use
Yes
Supports function calling and structured output
Languages
140+
Natively multilingual
강점
- ・Runs entirely on CPU - no GPU required for basic inference
- ・Only 2GB VRAM needed with Q4 quantization
- ・Multimodal: supports image and audio input despite tiny size
- ・128K context window for an edge model is exceptional
- ・Apache 2.0 license for maximum deployment flexibility
- ・Compatible with Ollama, llama.cpp, Transformers, MLX, WebGPU
약점
- ・No thinking mode support
- ・Limited reasoning capability compared to larger models
- ・Some sources report text-only 8K context variant (conflicting specs)
- ・Not suitable for complex multi-step reasoning tasks
- ・Quality trade-off for extreme efficiency
경쟁사 비교
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Gemma 4 E2B (2.1B) | N/A | N/A | N/A | Free (open weights) |
| Gemma 4 E4B (4.5B) | ~1300 (est) | N/A | ~50% (est) | Free (open weights) |
| Phi-3.5 Mini (3.8B) | ~1100 | N/A | ~55% | Free (open weights) |
| SmolLM2 (1.7B) | N/A | N/A | N/A | Free (open weights) |
| Qwen2.5 (3B) | ~1050 | N/A | ~45% | Free (open weights) |
Gemma 4 E2B is the smallest model in the Gemma 4 family with 2.1B effective parameters (5.1B with embeddings). It can run entirely on CPU with as little as 2GB VRAM (Q4), supports multimodal input (image + audio), and has a 128K context window. Released April 2, 2026 under Apache 2.0 license.
출처
분석 생성일: 2026-05-24