I just saw that PrismML released something quite interesting: the Ternary Bonsai series of language models. What caught my attention is that they managed to drastically reduce GPU memory consumption, dropping to one-ninth compared to 16-bit models. Basically, they use ternary weights of 1.58 bits that can only take three values: -1, 0, or +1. It sounds technical, but the idea is to eliminate redundant connections in the neural network to improve reasoning without sacrificing performance.



The interesting part about the price and accessibility is that the Bonsai 8B model only takes up 1.75 GB of weight storage, making it super practical for edge devices. Compared to heavier alternatives, the cost-benefit is quite favorable. They achieve an average of 75.5 on benchmarks, surpassing even their 1-bit predecessor and similar dense models. The best part is that it works natively on Apple devices, so you don’t need any weird workarounds.

In terms of speed, on an iPhone 17 Pro Max, they reach 27 tokens per second with 3 to 4 times better energy efficiency. That’s a significant leap for inference on mobile devices. Now they have models available with 8B, 4B, and 1.7B parameters, all open source on Hugging Face under Apache 2.0. For developers looking for high-performance AI solutions without spending a fortune on infrastructure, these Bonsai models seem like a pretty solid option.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin