PrismML releases the Ternary Bonsai series, using 1.58-bit weights {-1, 0, +1}, with only one-ninth the VRAM of 16-bit models. The 8B/4B/1.7B sizes are open-sourced on Hugging Face and natively run on Apple devices. The 8B weights are approximately 1.75 GB, with a benchmark score of 75.5, leading among peers. On the iPhone 17 Pro Max, the 8B model runs at 27 tokens/sec, with a 3–4 times improvement in energy efficiency. The weights are distributed under Apache 2.0 and run natively on Apple devices via the MLX framework.

MeNews

2026-05-21 06:47:33

Abstract generation in progress

ME News, April 17 (UTC+8). According to Dongcha Beating monitoring, PrismML has released the Ternary Bonsai series of language models. Using 1.58-bit (ternary weights) technology, the models reduce VRAM usage to one-ninth of that of a 16-bit model while maintaining high performance. The series includes three parameter sizes: 8B, 4B, and 1.7B. They are now open-sourced on Hugging Face and support native operation on Apple devices.

The so-called 1.58-bit model means restricting neural network weights to three values: {-1, 0, +1}. Compared with the previously pursued ultra-compressed 1-bit models (weights only {-1, +1}), introducing the “0” value can effectively remove redundant connections, allowing the model to preserve complex reasoning capabilities at an extremely small size. The released Ternary Bonsai 8B weight file is only 1.75 GB, with a benchmark average score of 75.5. This is not only 5 points higher than the company’s own 1-bit version, but also significantly leads in “intelligent density” (performance contributed per GB of VRAM) over similar dense models such as Qwen3.

Energy efficiency and operating speed are another core advantages of this series. On the iPhone 17 Pro Max, the 8B version can reach a speed of 27 tok/s, with an energy-efficiency improvement of about 3 to 4 times. For developers who need to deploy high-performance AI on edge devices such as phones and laptops, this means achieving intelligent performance close to that of full-precision models at a very small memory cost.

Currently, the Ternary Bonsai models are natively supported on Apple devices via the MLX framework. The model weights are distributed under the Apache 2.0 license.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

8 Likes

Reward
8
8
7
Share

Comment

Add a comment

WalletHealthInspector

· 5h ago

Ternary quantization + native MLX, Apple’s ecosystem is closed, putting immense pressure on the Android camp

View OriginalReply0

RouterRunner

· 10h ago

Leading peers by 75.5 points, but how much worse is it compared to full precision? Are there any ablation studies to check?

View OriginalReply0

NeonFusionIceCream

· 10h ago

Video memory reduced to 1/9, edge deployment costs plummeted, it feels like the inflection point for on-device AI has truly arrived.

View OriginalReply0

GateUser-c29c3db9

· 10h ago

iPhone 17 Pro Max 27 TOP/s, Apple's chip NPU has finally been fully utilized, MLX ecosystem is about to take off

View OriginalReply0

OrderCancellerAfterTheRain

· 10h ago

The name Bonsai is well-chosen; after pruning, only three values remain, and the model is indeed finely crafted like a bonsai.

View OriginalReply0

TvlTeaTime

· 10h ago

Apache 2.0 open source is well-received, but I'm curious about how the training is done, and how the ternary weight backpropagation works.

View OriginalReply0

GateUser-8ca669fd

· 10h ago

Ternary quantization {-1, 0, +1}, the idea from old papers has been implemented, and PrismML's engineering work is done beautifully.

View OriginalReply0

BugBountyBuddy

· 10h ago

1.75GB to run 8B? That's an incredible compression rate. Running large models locally on a phone is finally no longer a dream.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
232.26K Popularity
#
GrayscaleBuysAndStakesOver510KHYPE
8.91M Popularity
#
DailyPolymarketHotspot
1.02M Popularity
#
SpaceXOfficiallyFilesforIPO
749.23K Popularity
#
GateSquarePizzaDay
1.71M Popularity

Pinned

Sitemap

PrismML launches 1.58-bit model Ternary Bonsai, with parameters reduced by 9 times, surpassing peers in intelligence

Trending Topics

TradfiTradingChallenge

GrayscaleBuysAndStakesOver510KHYPE

DailyPolymarketHotspot

SpaceXOfficiallyFilesforIPO

GateSquarePizzaDay

Pinned