PrismML launches 1.58-bit model Ternary Bonsai, with parameters reduced by 9 times, surpassing peers in intelligence

robot
Abstract generation in progress
ME News Report, April 17 (UTC+8), according to Dongcha Beating monitoring, PrismML released the Ternary Bonsai series language models, which use 1.58-bit (ternary weights) technology to reduce model memory usage to one-ninth of a 16-bit model while maintaining high performance. The series includes 8B, 4B, and 1.7B parameter sizes, now open-sourced on Hugging Face and supporting native operation on Apple devices.
The so-called 1.58-bit model refers to limiting the weights in neural networks to three values: {-1, 0, +1}. Compared to the previous ultra-compressed 1-bit models (weights only {-1, +1}), introducing the "0" value can effectively eliminate redundant connections, allowing the model to retain complex reasoning capabilities at a very small size.
The released Ternary Bonsai 8B weight file is only 1.75 GB, with an average benchmark score of 75.5, not only 5 points higher than their own 1-bit version but also significantly surpassing similar dense models like Qwen3 in "intelligent density" (performance per GB of VRAM).
Energy efficiency and speed are another core advantages of this series. On the iPhone 17 Pro Max, the 8B version can run at 27 tokens/sec, with an energy efficiency improvement of about 3 to 4 times.
For developers needing to deploy high-performance AI on mobile, laptop, and other edge devices, this means achieving near-full-precision model intelligence at a minimal memory cost.
Currently, the Ternary Bonsai models are natively supported on Apple devices through the MLX framework. Model weights are distributed under the Apache 2.0 license.
(Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 9
  • 10
  • Share
Comment
Add a comment
Add a comment
OldKeyboardTraitor
· 5h ago
The three-value weighting is actually much more difficult than binarization; the presence of 0 allows for more flexible information retention, and PrismML's choice at this step is precise.
View OriginalReply0
BoredInBlockspace
· 5h ago
1.75GB fits 8B parameters; in the future, local LLMs will truly become the norm.
View OriginalReply0
0xLateDiner
· 5h ago
1.58-bit weights are too aggressive; the VRAM is directly reduced to one-ninth, and this compression ratio is quite impressive.
View OriginalReply0
GateUser-0f33f9ef
· 5h ago
{-1,0,+1} three-value quantization, mathematical elegance in engineering has also been realized.
View OriginalReply0
ProofOfSnack
· 6h ago
The name Ternary Bonsai is clever; the three values are like pruning a bonsai, simplifying by removing the unnecessary.
View OriginalReply0
BerryColdWallet
· 6h ago
Running the 8B model at 27 tokens/sec on iPhone? Apple users are ecstatic
View OriginalReply0
GateUser-e1cfc287
· 6h ago
The energy efficiency ratio increases by 3-4 times, and the power consumption anxiety of edge AI has been solved.
View OriginalReply0
L2Mailman
· 6h ago
MLX native support, adding another piece to the Apple ecosystem closed loop
View OriginalReply0
FoldedCosmosCat
· 6h ago
Open source + Apache 2.0, PrismML has opened up this pattern.
View OriginalReply0
View More
  • Pinned