GLM-5.1 REAP Series Models Released, Offering Multiple Quantization and Pruning Variants

ME News Report, April 22 (UTC+8), recently, based on the 7,440-billion-parameter BF16 model GLM-5.1, the GLM-5.1 REAP series models have been released.
This series is generated through REAP pruning and various quantization techniques, designed to adapt to different hardware.
REAP pruning evaluates the contribution of each expert in the mixture-of-experts model, removes the least contributing experts, and re-numbers the routing gates to minimize quality loss.
The series offers multiple core variants including BF16, NVFP4, GPTQ W4A16, and GGUF format, with parameter sizes ranging from approximately 285GB to 1125GB, optimized for different architectures such as Hopper, Ampere, Blackwell, and others.
All models are licensed under the MIT License and can be deployed using engines like sglang, vLLM, or llama.cpp.
(Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned