Sakana AI has broken the memory bottleneck of deep models; with 1/B memory, large networks can be trained, and the experiments even outperform end-to-end methods.

View Original
MeNews
Sakana AI Launches DiffusionBlocks: Independent Block Training Cuts VRAM to One-B?th
At ICLR 2026, Sakana AI and the University of Tokyo proposed DiffusionBlocks, which partition the network into blocks and map block updates to the reverse denoising process of diffusion models, achieving block-level independent training. During training, only one block is loaded, while other blocks do not occupy GPU memory, reducing memory usage to 1/B of the original, breaking through the memory bottleneck caused by depth. Experiments show that in visual Transformers, DiT image generation, and text generation tasks, block training can match or even surpass end-to-end training; for Looped Transformers, unidirectional forward updates can also be used, significantly reducing training computation.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned