Meituan releases the native multimodal large model LongCat-Next

robot
Abstract generation in progress

Sina Tech reported on March 27 in the morning that Meituan has released and fully open-sourced LongCat-Next, a native multimodal large model, along with its core components: the discrete native resolution vision tokenizer (dNaViT).

The model breaks the current large-model “language-centered” traditional patchwork architecture, unifying images, speech, and text into the same-origin discrete Tokens. Using a purely “next Token prediction” (Next Token Prediction, NTP) paradigm, LongCat-Next makes vision and speech the AI’s “native mother tongue.”

According to the introduction, LongCat-Next achieves three key technological breakthroughs: first, the discrete native autoregressive architecture (DiNA) thoroughly breaks down the barriers between modalities; second, the discrete native resolution vision tokenizer (dNaViT) builds a “dictionary” for the visual world; and third, a semantic-alignment-complete encoder solves the industry challenge of “inevitably losing information” when discretizing.

Massive information and precise analysis are available in the Sina Finance app.

责任编辑:江钰涵

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin