ByteDance open-sources Cola DLM: Redefining text generation with diffusion models

robot
Abstract generation in progress
ME News Report, May 16 (UTC+8), according to Beating Monitoring, ByteDance's Seed team has open-sourced Cola DLM.
This is a set of continuous latent diffusion language models that attempt to bypass the fixed token-by-token generation path of large language models, changing text generation to first organize high-level semantics and then revert to specific words.
The core of Cola DLM is Text VAE + block-causal DiT.
Text VAE first maps discrete text into a continuous latent space, and block-causal DiT then learns the latent prior through Flow Matching.
Finally, a conditional decoder restores the latent variables back into text.
The diffusion process handles latent semantic representations, not repeatedly denoising directly at the token level.
This open-source version is a 2B-level model, with approximately 2.3 billion total parameters, including a core DiT with 1.8 billion parameters and an additional 500 million parameters for VAE.
In evaluations such as LAMBADA, MMLU, OBQA, HellaSwag, RACE, SIQA, SQuAD, and Story Cloze, the paper states that under a unified generative evaluation protocol, it has demonstrated scaling performance competitive with baseline models of the same size like AR / LLaDA, and achieved the best results in the final average score.
However, it is currently still a research checkpoint, not a directly usable dialogue model.
The official note states that this model has not undergone instruction fine-tuning or RLHF, and its main purpose is to study how continuous latent diffusion can be used for text generation.
The paper also shows preliminary experiments extending to unified modeling of text and images, but this open-source repository only includes the text pipeline.
(Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 3
  • 1
  • Share
Comment
Add a comment
Add a comment
BreadthHunter
· 7h ago
Eight evaluation items level with AR, but without RLHF, it might still fall a bit short in actual use.
View OriginalReply0
VineGeometry
· 7h ago
Is the block-causal design intended for long texts or efficiency? Please elaborate in the paper.
View OriginalReply0
GateUser-a4680931
· 7h ago
Does diffusion at the latent semantic layer produce higher quality results than AR? Waiting for actual measurements.
View OriginalReply0