PrismML launches 1.58-bit model Ternary Bonsai, with parameters reduced by 9 times, surpassing peers in intelligence

robot
Abstract generation in progress

ME News, April 17 (UTC+8). According to Dongcha Beating monitoring, PrismML has released the Ternary Bonsai series of language models. Using 1.58-bit (ternary weights) technology, the models reduce VRAM usage to one-ninth of that of a 16-bit model while maintaining high performance. The series includes three parameter sizes: 8B, 4B, and 1.7B. They are now open-sourced on Hugging Face and support native operation on Apple devices.

The so-called 1.58-bit model means restricting neural network weights to three values: {-1, 0, +1}. Compared with the previously pursued ultra-compressed 1-bit models (weights only {-1, +1}), introducing the “0” value can effectively remove redundant connections, allowing the model to preserve complex reasoning capabilities at an extremely small size. The released Ternary Bonsai 8B weight file is only 1.75 GB, with a benchmark average score of 75.5. This is not only 5 points higher than the company’s own 1-bit version, but also significantly leads in “intelligent density” (performance contributed per GB of VRAM) over similar dense models such as Qwen3.

Energy efficiency and operating speed are another core advantages of this series. On the iPhone 17 Pro Max, the 8B version can reach a speed of 27 tok/s, with an energy-efficiency improvement of about 3 to 4 times. For developers who need to deploy high-performance AI on edge devices such as phones and laptops, this means achieving intelligent performance close to that of full-precision models at a very small memory cost.

Currently, the Ternary Bonsai models are natively supported on Apple devices via the MLX framework. The model weights are distributed under the Apache 2.0 license.
(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • 7
  • Share
Comment
Add a comment
Add a comment
WalletHealthInspector
· 5h ago
Ternary quantization + native MLX, Apple’s ecosystem is closed, putting immense pressure on the Android camp
View OriginalReply0
RouterRunner
· 10h ago
Leading peers by 75.5 points, but how much worse is it compared to full precision? Are there any ablation studies to check?
View OriginalReply0
NeonFusionIceCream
· 10h ago
Video memory reduced to 1/9, edge deployment costs plummeted, it feels like the inflection point for on-device AI has truly arrived.
View OriginalReply0
GateUser-c29c3db9
· 10h ago
iPhone 17 Pro Max 27 TOP/s, Apple's chip NPU has finally been fully utilized, MLX ecosystem is about to take off
View OriginalReply0
OrderCancellerAfterTheRain
· 10h ago
The name Bonsai is well-chosen; after pruning, only three values remain, and the model is indeed finely crafted like a bonsai.
View OriginalReply0
TvlTeaTime
· 10h ago
Apache 2.0 open source is well-received, but I'm curious about how the training is done, and how the ternary weight backpropagation works.
View OriginalReply0
GateUser-8ca669fd
· 10h ago
Ternary quantization {-1, 0, +1}, the idea from old papers has been implemented, and PrismML's engineering work is done beautifully.
View OriginalReply0
BugBountyBuddy
· 10h ago
1.75GB to run 8B? That's an incredible compression rate. Running large models locally on a phone is finally no longer a dream.
View OriginalReply0
  • Pinned