Google DeepMind open-sources the Gemma 4 multimodal model family

robot
Abstract generation in progress

ME News report: On April 3 (UTC+8), Google DeepMind has recently open-sourced the Gemma 4 multimodal model family. The series supports text and image inputs (the smaller models also support audio), generates text outputs, and includes both pre-trained and instruction-tuned variants. The context window can be as large as 256K tokens, and it supports more than 140 languages. The models adopt two architectures: Dense and mixture of experts (MoE). There are four sizes: E2B, E4B, 26B A4B, and 31B. Key capabilities include high-performance inference, scalable multimodal processing, on-device optimization, expanded context windows, improved encoding and agent capabilities, and native system prompt support. In terms of technical details, the models use a hybrid attention mechanism, where the global layers use unified key-value pairs and a scaled RoPE (p-RoPE). Among them, the E2B and E4B models use layer-wise embedding (PLE) technology, resulting in fewer effective parameters than the total parameters. Meanwhile, the 26B A4B MoE model activates only 3.8B parameters during inference, and its runtime speed is close to that of a 4B-parameter model. (Source: InFoQ)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin