PrismML releases the Ternary Bonsai series, using 1.58-bit weights {-1, 0, +1}, with GPU memory only one-ninth of a 16-bit model. The 8B/4B/1.7B sizes are open-sourced on Hugging Face and natively run on Apple devices. The 8B weights are approximately 1.75 GB, with a benchmark score of 75.5, leading among peers. On the iPhone 17 Pro Max, 8B runs at 27 tokens/sec, with a 3–4 times improvement in energy efficiency. The weights are distributed under Apache 2.0 and run natively on Apple devices via the MLX framework.

MeNews

2026-05-21 04:14:48

Abstract generation in progress

ME News Report, April 17 (UTC+8), according to Dongcha Beating monitoring, PrismML released the Ternary Bonsai series language models, which use 1.58-bit (ternary weights) technology to reduce model memory usage to one-ninth of a 16-bit model while maintaining high performance. The series includes 8B, 4B, and 1.7B parameter sizes, now open-sourced on Hugging Face and supporting native operation on Apple devices.
The so-called 1.58-bit model refers to limiting the weights in neural networks to three values: {-1, 0, +1}. Compared to the previous ultra-compressed 1-bit models (weights only {-1, +1}), introducing the "0" value can effectively eliminate redundant connections, allowing the model to retain complex reasoning capabilities at a very small size.
The released Ternary Bonsai 8B weight file is only 1.75 GB, with an average benchmark score of 75.5, not only 5 points higher than their own 1-bit version but also significantly leading in "intelligent density" (performance contribution per GB of VRAM) over similar dense models like Qwen3.
Energy efficiency and speed are another core advantages of this series. On the iPhone 17 Pro Max, the 8B version can run at 27 tokens/sec, with an energy efficiency improvement of about 3 to 4 times.
For developers needing to deploy high-performance AI on mobile, laptop, and other edge devices, this means achieving near-full-precision model intelligence with minimal memory cost.
Currently, the Ternary Bonsai models are natively supported on Apple devices via the MLX framework. Model weights are distributed under the Apache 2.0 license.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

10 Likes

Reward
10
8
5
Share

Comment

Add a comment

SushiSlippage

· 10h ago

{-1,0,+1} reminds me of BinaryNet back in the day, but this time it actually seems to work.

View OriginalReply0

HexiHoodie

· 10h ago

The energy efficiency ratio has increased by 3-4 times, meaning the battery life finally won't lose 50% of its charge in half an hour.

View OriginalReply0

MevInRetrospect

· 10h ago

Apache 2.0 open source is highly praised; this is real open source, unlike some that just do gimmicks.

View OriginalReply0

TheClarityAfterLiquidating

· 10h ago

27 tok/s on a phone, faster than my laptop running 7B back in the day, times have changed

View OriginalReply0

0XNightRun

· 10h ago

Native support for MLX is crucial, and Apple ecosystem users are ecstatic—no more hassle with conversions.

View OriginalReply0

PaperSculptureOctopusPosition

· 10h ago

Ternary Bonsai, this name is quite interesting; ternary weighting is indeed a delicately designed bonsai-level structure.

View OriginalReply0

AutumnSlopeCabin

· 10h ago

One-ninth of the video memory? I never even dared to imagine it before, and now the iPhone can run large models locally.

View OriginalReply0

RedTelephoneBoothRuins

· 10h ago

1.75GB runs an 8B model, this compression ratio is incredible, mobile AI can finally be used.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
228.26K Popularity
#
GrayscaleBuysAndStakesOver510KHYPE
8.91M Popularity
#
DailyPolymarketHotspot
1.01M Popularity
#
SpaceXOfficiallyFilesforIPO
748.26K Popularity
#
GateSquarePizzaDay
1.71M Popularity

Pinned

Sitemap

PrismML launches 1.58-bit model Ternary Bonsai, with parameters reduced by 9 times, surpassing peers in intelligence

Trending Topics

TradfiTradingChallenge

GrayscaleBuysAndStakesOver510KHYPE

DailyPolymarketHotspot

SpaceXOfficiallyFilesforIPO

GateSquarePizzaDay

Pinned