128 A100 units, trained from scratch! ByteDance’s open-source 3B versatile multimodal model Lance

robot
Abstract generation in progress

ME News, May 19 (UTC+8). According to Beating Monitoring, ByteDance Research has officially open-sourced its native unified multimodal large model, Lance. This is a lightweight model with activation parameters of only 3B, supporting image and video understanding, generation, and editing within a single framework. Currently, most mainstream unified models heavily depend on scaling up parameter size or reusing text-to-image architectures; Lance, instead, has charted a low-compute collaborative route.

The research and development team trained the model entirely from scratch, reducing the total compute budget across the full training cycle to the equivalent of 128 A100 GPUs. To resolve internal conflicts across different modalities and tasks, Lance introduces two strict architectural isolations:

  • It adopts a dual-stream mixture of experts (MoE) architecture to process intertwined multimodal sequences, sharing the underlying context while decoupling the computational paths for understanding and generation.
  • It introduces modality-aware rotational position encoding, directly weakening signal interference between heterogeneous visual tokens from images and videos.

Extreme compute compression has not lowered the performance ceiling. With only 3B activation parameters, Lance’s image and video generation and editing performance leads in most benchmarks among existing open-source unified models. Through multi-task collaboration, it achieves a low-cost approach that balances generation with semantic understanding. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned