MIT Kai-Wei Ma's team proposed the language diffusion model ELF (Embedded Language Flows), which performs diffusion denoising in a continuous embedding space, and in the final step, converts the vectors back into discrete tokens, avoiding autoregressive or independent decoders. ELF primarily uses continuous space denoising and employs shared weights to achieve discretization. Experiments show that ELF-B with 105 million parameters and 32-step sampling has an OpenWebText Generation PPL of approximately 24.1, with training tokens around 45 billion, compared to methods typically exceeding 500 billion. This indicates that the continuous diffusion path for language remains feasible, with issues mainly in modeling interfaces and sampling design.

BlockBeatNews

2026-05-13 05:20:33

Abstract generation in progress

According to Beating Monitoring, MIT’s He Kaiming team released a language diffusion model ELF (Embedded Language Flows). It does not follow the GPT-style autoregressive “predict the next token” approach, but instead completes text generation within a continuous embedding space, only converting back to discrete tokens at the final step.

Diffusion models are already mature in image generation, but applying them to text has always been awkward: images are naturally continuous signals, while language is composed of discrete tokens. Previously, many continuous diffusion text models either repeatedly incorporate token-level supervision during the generation process or require an additional independent decoder. ELF’s approach is cleaner: most steps only denoise in the continuous vector space, with the final step using a shared-weight network to discretize.

The experimental results are also impactful. In the OpenWebText unconditional generation evaluation, ELF-B with 105 million parameters achieved about 24.1 perplexity (Gen. PPL) in 32 sampling steps, outperforming various discrete and continuous diffusion language model baselines. More importantly, ELF-B used only about 45 billion training tokens, while comparison methods typically used over 500 billion, roughly an order of magnitude fewer training tokens. This result at least suggests that the continuous diffusion route is not blocked by “language discreteness” in language modeling; previous issues are more likely related to modeling interfaces and sampling design.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
1.56M Popularity
#
DailyPolymarketHotspot
923.47K Popularity
#
JaneStreetReducesBitcoinETFHoldings
99.13K Popularity
#
TrumpVisitsChinaMay13
26.08M Popularity
#
WCTCTradingKingPK
792.1K Popularity

Sitemap

He Kaiming's ELF team: The language diffusion model has finally been successfully run

Trending Topics

GateSquareMayTradingShare

DailyPolymarketHotspot

JaneStreetReducesBitcoinETFHoldings

TrumpVisitsChinaMay13

WCTCTradingKingPK

Pin