He Kaiming's ELF team: The language diffusion model has finally been successfully run

robot
Abstract generation in progress

According to Beating Monitoring, MIT’s He Kaiming team released a language diffusion model ELF (Embedded Language Flows). It does not follow the GPT-style autoregressive “predict the next token” approach, but instead completes text generation within a continuous embedding space, only converting back to discrete tokens at the final step.

Diffusion models are already mature in image generation, but applying them to text has always been awkward: images are naturally continuous signals, while language is composed of discrete tokens. Previously, many continuous diffusion text models either repeatedly incorporate token-level supervision during the generation process or require an additional independent decoder. ELF’s approach is cleaner: most steps only denoise in the continuous vector space, with the final step using a shared-weight network to discretize.

The experimental results are also impactful. In the OpenWebText unconditional generation evaluation, ELF-B with 105 million parameters achieved about 24.1 perplexity (Gen. PPL) in 32 sampling steps, outperforming various discrete and continuous diffusion language model baselines. More importantly, ELF-B used only about 45 billion training tokens, while comparison methods typically used over 500 billion, roughly an order of magnitude fewer training tokens. This result at least suggests that the continuous diffusion route is not blocked by “language discreteness” in language modeling; previous issues are more likely related to modeling interfaces and sampling design.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin