RAEv2 Open Source: Convergence Speed Increased by 10 Times, 80 Training Epochs Surpass the Previous 800 Epoch Record

robot
Abstract generation in progress
CryptoWorld News: The RAEv2 open-source project was jointly launched by institutions including Adobe Research, the Australian National University (ANU), and the Xie Saining team at New York University (NYU), among others. It improves convergence speed by 10 times, surpassing the previous record of 800 rounds with just 80 training rounds. As an image reconstruction scheme based on diffusion models to replace traditional variational autoencoders (VAE), the new version addresses pain points such as poor reconstruction quality in the first generation, an inability to use standard classifier-free guidance (CFG), and extremely slow convergence. On ImageNet, it achieves a global FID (GFID) score of 1.06 with only 80 training rounds. In its architectural design, the research team delivered three core optimizations: adopting a multi-layer representation scheme that directly sums the outputs of the last K layers of the encoder, preserving the structure of the underlying subspace. The new architecture also clarifies the complementary mechanism between the representation autoencoder and representation alignment (REPA), enabling stronger performance on generative tasks. Tests show that to reach a GFID below 2, the first-generation model requires 177 rounds, while the new architecture needs only 35 rounds.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 9
  • Repost
  • Share
Comment
Add a comment
Add a comment
GateUser-b6d80ba0
· 9h ago
Adding the last K layers of the encoder together has a bit of a ResNet skip connection feel, but applied in the latent space
View OriginalReply0
NeonVortexInTheSmog
· 9h ago
Diffusion reconstruction + CFG compatibility, clearing technical debt in one go
View OriginalReply0
CyberBridgeDeepPerspective
· 9h ago
Round 35 GFID<2, this efficiency makes the alchemist ecstatic
View OriginalReply0
RevokingPermissionsOnARainy
· 9h ago
Someone finally took the issue of VAE reconstruction blurriness seriously, tearing up.
View OriginalReply0
HoldingPositionsIsLikeTending
· 9h ago
Adobe + ANU + NYU three partners join forces, maximizing resources
View OriginalReply0
CandleAfterTheRain
· 9h ago
The multi-layer representation preserves the underlying structure; this design is very detailed and not just a simple stacking depth.
View OriginalReply0
BitByBitBenny
· 9h ago
GFID 1.06 only 80 rounds, the previous generation 177 rounds was cut off halfway, convergence speed skyrocketing
View OriginalReply0
GateUser-0f8d377b
· 9h ago
Xie Saining's team has connected reconstruction and generation this time; the REPA complementary mechanism has some substance.
View OriginalReply0
Salt-BakedSentimentChart
· 9h ago
Using diffusion models as VAE is indeed a wild idea.
View OriginalReply0
View More
  • Pinned