PrismML launches 1.58-bit model Ternary Bonsai, with parameters reduced by 9 times, surpassing peers in intelligence

robot
Abstract generation in progress

ME News message. On April 17 (UTC+8), according to Dongcha Beating monitoring, PrismML released the Ternary Bonsai series of language models. Using 1.58-bit (ternary weights) technology, the models reduce VRAM usage to one-ninth of a 16-bit model while maintaining high performance.

The series includes three parameter sizes: 8B, 4B, and 1.7B. They have now been open-sourced on Hugging Face and support native execution on Apple devices.

The so-called 1.58-bit model means restricting neural network weights to three values: {-1, 0, +1}. Compared with the previous 1-bit models that pursued ultra-extreme compression (weights only {-1, +1}), introducing the “0” value can effectively remove redundant connections, allowing the model to retain complex reasoning capabilities in an extremely small footprint.

The released Ternary Bonsai 8B weight file is only 1.75 GB, and its average benchmark score reaches 75.5. This is not only 5 points higher than its own 1-bit version, but also significantly outperforms similar dense models such as Qwen3 in “intelligence density” (performance contributed per GB of VRAM).

Energy efficiency and runtime speed are another core advantage of this series. On the iPhone 17 Pro Max, the 8B version can achieve speeds of up to 27 tok/s, with an energy-efficiency improvement of about 3 to 4 times. For developers who need to deploy high-performance AI on edge devices such as phones and laptops, this means being able to obtain intelligent performance close to full-precision models at the cost of very little memory.

Currently, the Ternary Bonsai models are already supported natively on Apple devices through the MLX framework. Model weights are distributed under the Apache 2.0 license.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • 12
  • Share
Comment
Add a comment
Add a comment
GateUser-44dde53b
· 13h ago
Energy efficiency improved by 3-4 times, this generation of iPhone's battery life pressure can be reduced a bit.
View OriginalReply0
Neon-LitStreetsAfterTheRain
· 14h ago
The real-world test data for the iPhone 17 Pro Max is presented, and it's much more detailed than the PPT.
View OriginalReply0
NeonFusionIceCream
· 14h ago
{-1,0,+1} three-value weights, quantized to the extreme while still maintaining a score of 75.5, demonstrating engineering capability.
View OriginalReply0
TreatMemesAsBeliefs
· 14h ago
MLX Framework Adaptation Instructions: Apple Ecosystem AI Deployment Deepening
View OriginalReply0
QuantizedDaydream
· 14h ago
Apache 2.0 License is well-regarded; business-friendly policies are necessary for widespread adoption.
View OriginalReply0
HaiyanColdWallet
· 14h ago
Hugging Face has open-sourced it; try the 4B version's performance this weekend.
View OriginalReply0
GlassBottleFeather
· 14h ago
Apple device native running of the 8B model, at 27 tokens per second—this speed is usable on a phone.
View OriginalReply0
NeonMint
· 14h ago
1.58 bits is too intense, the video memory was directly reduced to 1/9, I’m impressed by this compression ratio.
View OriginalReply0
  • Pinned