Zyphra releases ZAYA1-8B-Diffusion-Preview, transforming autoregressive large language models into hybrid expert diffusion models, becoming the first diffusion language model trained within the AMD hardware ecosystem. By using TiDAR to skip from-scratch pretraining, it can denoise 16 tokens simultaneously in a single forward pass, turning memory bottlenecks into compute bottlenecks. Empirical tests show a 4.6x acceleration with CCA attention + lossless sampling, increasing to 7.7x after switching to the hybrid logit sampler.

MeNews

2026-05-23 09:07:37

Abstract generation in progress

AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, Zyphra released ZAYA1-8B-Diffusion-Preview, a hybrid expert (MoE) diffusion model transformed from an autoregressive large language model. Although the official promotion claims it as the "first" model to implement this architecture transformation, this approach was already pioneered by teams like SDAR and LLaDA 2.0 at the end of last year. The true uniqueness of ZAYA1 lies in the fact that it is the first diffusion language model trained within the AMD hardware ecosystem.

Setting aside marketing rhetoric, the model still validates the engineering efficiency benefits of the diffusion architecture. Traditional autoregressive models are limited by word-by-word serial generation, and accumulating KV caches can push generation speed to physical limits. As recently revealed by the industry trend from the He Kaiming team’s pure diffusion model ELF, parallel denoising is the key to breaking this bottleneck.

ZAYA1 adopts the TiDAR scheme to skip from-scratch pretraining, enabling simultaneous denoising of 16 token candidates in a single forward pass, completely transforming the VRAM bandwidth bottleneck into a compute bottleneck.

Practical tests show that, combined with ZAYA1’s dedicated CCA attention mechanism, using a standard lossless sampler can achieve a 4.6x decoding acceleration ratio without compromising generation quality. Switching to a hybrid logit sampler further boosts the acceleration ratio to 7.7x, providing substantial cost reduction for large-scale inference tasks that are time-consuming. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

8 Likes

Reward
8
7
2
Share

Comment

Add a comment

GateUser-9008328f

· 7h ago

The pre-training cost saved by TiDAR is enough to train how many downstream tasks?

View OriginalReply0

CrystalBallForSentiment

· 7h ago

The diffusion language model finally no longer has to worry about NV's attitude, which is a good thing.

View OriginalReply0

GateUser-eccf92a1

· 7h ago

TiDAR skipping pre-training is so cost-effective; AMD's ecosystem finally has a competitive diffusion model.

View OriginalReply0

GateUser-4aa73916

· 8h ago

Single forward pass can handle 16 tokens, making it perfect for latency-sensitive scenarios.

View OriginalReply0

Semi-MeltedIceCream

· 8h ago

CCA Attention Lossless Sampling 4.6x, looking to write a technical blog with engineering details

View OriginalReply0

MosaicButterfly

· 8h ago

16 tokens denoise simultaneously, trading memory for computing power; this approach is very friendly to consumer-grade cards.

View OriginalReply0

LookingAtTheCandlestickChart

· 8h ago

Training on AMD instead of porting, the discourse dominance in the ecosystem has started to change

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
307.45K Popularity
#
PlatinumCardCreatorExclusive
93.96K Popularity
#
DailyPolymarketHotspot
1.04M Popularity
#
GateSquarePizzaDay
1.77M Popularity
#
SpaceXOfficiallyFilesforIPO
564.7K Popularity

Pinned

Sitemap

Zyphra releases the first diffusion language model in the AMD ecosystem, achieving a maximum speedup of 7.7 times

Trending Topics

TradfiTradingChallenge

PlatinumCardCreatorExclusive

DailyPolymarketHotspot

GateSquarePizzaDay

SpaceXOfficiallyFilesforIPO

Pinned