Tinygrad claims that GLM5.2 can run at 120 tokens per second in a dual-machine interconnected Blackwell setup, priced at $150k.

robot
Abstract generation in progress

BlockBeats News, June 21 — GPU seller Tinygrad announced that, according to reliable sources, the GLM 5.2 model can achieve inference speeds of 120 tokens per second on two connected Blackwell architecture tinyboxes.

The configuration is priced at $150k, with options for either two standard tinyboxes or a single tinybox Pro, both capable of reaching the above performance. Tinygrad promotes this as a selling point, focusing on a private deployment route of "one-time purchase, never paying cloud fees," directly competing with on-demand billed cloud inference services.

Currently, this news has not been confirmed by the official GLM team, and Tinygrad has not disclosed further technical details.


Click the original link below to join the Beating · Feishu AI News channel, monitoring global AI hotspots and news 24/7.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned