Reasoning cost is only one-twentieth of GPT-5.5, Gemini 3.2 real-time model appears on Google Cloud

robot
Abstract generation in progress

According to Beating Monitoring, a base model option named gemini-3.2-flash-lite-live-preview appears in the model filtering list in the Google Cloud Console. This is another exposure of this model series on the official platform after traces were revealed earlier this month in iOS app build packages and AI Studio. The new option comes with the suffixes lite and live, indicating that Google is splitting out specialized versions for ultra-low-latency real-time interactions. Abacus.AI CEO Bindu Reddy previously disclosed that Gemini 3.2 Flash’s encoding and reasoning capabilities reach 92% of GPT-5.5, but thanks to distillation and sparsification techniques, its reasoning cost is only one-twentieth of the latter, and most query latency is below 200 milliseconds. With cloud interfaces moving ahead early, industry insiders expect this ultra–cost-performance-focused lightweight model to be officially released at the Google I/O on May 20.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned