GLM-5.1 REAP Series Models Released, Offering Multiple Quantization and Pruning Variants

robot
Abstract generation in progress
ME News message: On April 22 (UTC+8), recently, based on the 744 billion-parameter BF16 model GLM-5.1, the GLM-5.1 REAP series models have been released. This series is generated through REAP pruning and various quantization techniques, aiming to adapt to different hardware. REAP pruning evaluates each expert’s contribution in a mixture-of-experts model, removes the lowest-contributing experts, and renumbers the routing gates to minimize quality loss. The series provides multiple core variants, including BF16, NVFP4, GPTQ W4A16, and GGUF formats, with parameter sizes ranging from approximately 285GB to 1125GB, optimized for GPUs or CPUs of different architectures such as Hopper, Ampere, Blackwell, and more. All models use the MIT license and can be deployed via engines such as sglang, vLLM, or llama.cpp. (Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned