128 A100 GPUs trained from scratch! ByteDance’s open-source 3B versatile multimodal model Lance

ME News, May 19 (UTC+8). According to Beating Monitoring, ByteDance Research has officially open-sourced Lance, a native unified multimodal large model. It is a lightweight model whose activation parameters are only 3B, supporting image and video understanding, generation, and editing within a single framework. At present, mainstream unified models largely depend on expanding parameter scale or reusing text-to-image architectures; Lance, instead, has paved a collaborative route with extremely low compute. The R&D team trained the model entirely from scratch and reduced the total compute budget for the entire training cycle to 128 A100 GPUs. To address internal conflicts among different modalities and tasks, Lance implements two hard separations at the architectural level:

  • It adopts a dual-stream mixture of experts (MoE) architecture to process interwoven multimodal sequences, sharing the underlying context while decoupling the computation paths for understanding and generation.
  • It introduces modality-aware rotary position encoding, directly weakening signal interference between heterogeneous visual tokens of images and videos.

Despite extreme compute compression, performance ceilings have not been lowered. With activation parameters of only 3B, Lance’s image and video generation and editing results lead existing open-source unified models in the vast majority of benchmark tests. Through multi-task collaboration, it has demonstrated a low-cost approach that strikes a balance between generation and semantic understanding.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned