The first diffusion language model in the AMD ecosystem, TiDAR, skips pre-training directly, using a 16-token parallel denoising approach. This idea is quite aggressive, and the operation of turning VRAM into computing power bottleneck is quite insightful.

View Original
MeNews
Zyphra releases the first diffusion language model in the AMD ecosystem, with a maximum speedup of 7.7 times
Zyphra releases ZAYA1-8B-Diffusion-Preview, transforming autoregressive large language models into hybrid expert diffusion models, becoming the first diffusion language model trained within the AMD hardware ecosystem. By using TiDAR to skip from-scratch pretraining, it can denoise 16 tokens simultaneously in a single forward pass, turning memory bottlenecks into compute bottlenecks. Empirical tests show a 4.6x acceleration with CCA attention + lossless sampling, increasing to 7.7x after switching to the hybrid logit sampler.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned