Google DeepMind opens source Gemma 4 multimodal model family

robot
Abstract generation in progress

ME News Report, April 3rd (UTC+8), Google DeepMind has recently open-sourced the Gemma 4 multimodal model family. The series supports text and image inputs (small models also support audio), generates text outputs, includes pre-training and instruction tuning variants, with context windows up to 256K tokens, and supports over 140 languages. The models utilize both Dense and Mixture of Experts (MoE) architectures, with four sizes: E2B, E4B, 26B A4B, and 31B. Their core capabilities include high-performance inference, scalable multimodal processing, device-side optimization, enlarged context windows, enhanced encoding and agent capabilities, and native system prompt support. Technically, the models employ a hybrid attention mechanism, with global layers using unified key-value pairs and scaled RoPE (p-RoPE). Notably, the E2B and E4B models use Per-Layer Embedding (PLE) technology, resulting in effective parameters fewer than the total parameters. Meanwhile, the 26B A4B MoE model activates only 3.8B parameters during inference, achieving speeds close to a 4B parameter model. (Source: InFoQ)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin