PrismML launches 1.58-bit model Ternary Bonsai, with parameters reduced by 9 times, surpassing peers in intelligence

robot
Abstract generation in progress
ME News Report, April 17 (UTC+8), according to Dongcha Beating monitoring, PrismML released the Ternary Bonsai series language models, which use 1.58-bit (ternary weights) technology to reduce model memory usage to one-ninth of a 16-bit model while maintaining high performance. The series includes 8B, 4B, and 1.7B parameter sizes, now open-sourced on Hugging Face and supporting native operation on Apple devices.
The so-called 1.58-bit model refers to limiting the weights in neural networks to three values: {-1, 0, +1}. Compared to the previous ultra-compressed 1-bit models (weights only {-1, +1}), introducing the "0" value can effectively eliminate redundant connections, allowing the model to retain complex reasoning capabilities at a very small size.
The released Ternary Bonsai 8B weight file is only 1.75 GB, with an average benchmark score of 75.5, not only 5 points higher than their own 1-bit version but also significantly leading in "intelligent density" (performance contribution per GB of VRAM) over similar dense models like Qwen3.
Energy efficiency and speed are another core advantages of this series. On the iPhone 17 Pro Max, the 8B version can run at 27 tokens/sec, with an energy efficiency improvement of about 3 to 4 times.
For developers needing to deploy high-performance AI on mobile, laptop, and other edge devices, this means achieving near-full-precision model intelligence with minimal memory cost.
Currently, the Ternary Bonsai models are natively supported on Apple devices via the MLX framework. Model weights are distributed under the Apache 2.0 license.
(Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • 5
  • Share
Comment
Add a comment
Add a comment
SushiSlippage
· 10h ago
{-1,0,+1} reminds me of BinaryNet back in the day, but this time it actually seems to work.
View OriginalReply0
HexiHoodie
· 10h ago
The energy efficiency ratio has increased by 3-4 times, meaning the battery life finally won't lose 50% of its charge in half an hour.
View OriginalReply0
MevInRetrospect
· 10h ago
Apache 2.0 open source is highly praised; this is real open source, unlike some that just do gimmicks.
View OriginalReply0
TheClarityAfterLiquidating
· 10h ago
27 tok/s on a phone, faster than my laptop running 7B back in the day, times have changed
View OriginalReply0
0XNightRun
· 10h ago
Native support for MLX is crucial, and Apple ecosystem users are ecstatic—no more hassle with conversions.
View OriginalReply0
PaperSculptureOctopusPosition
· 10h ago
Ternary Bonsai, this name is quite interesting; ternary weighting is indeed a delicately designed bonsai-level structure.
View OriginalReply0
AutumnSlopeCabin
· 10h ago
One-ninth of the video memory? I never even dared to imagine it before, and now the iPhone can run large models locally.
View OriginalReply0
RedTelephoneBoothRuins
· 10h ago
1.75GB runs an 8B model, this compression ratio is incredible, mobile AI can finally be used.
View OriginalReply0
  • Pinned