Tsinghua released LCM: compatible with all SD large models, LoRA, plug-ins, etc

Source: New Zhiyuan

Author: Tan Weida

Edit: LRS is sleepy

图片来源:由无界AI 生成 Image source: Unbounded AI Generated Latent Consistency Models is an image generation architecture with generation speed as the main highlight.

Unlike traditional diffusion models that require multiple iterations (such as Stable Diffusion), LCM can achieve about 30 steps in just 1 to 4 steps.

Invented by Luo Simian and Tan Yiqin, graduate students at Tsinghua University’s Institute for Interdisciplinary Information Sciences, LCM has accelerated the generation of Wensheng diagrams by 5-10 times, and the world has since entered the era of real-time generative AI.

LCM-LoRA:

Project Homepage:

Stable Diffusion杀手:LCM

Prior to the advent of LCM, different teams explored a variety of SD1.5 and SDXL alternatives in various directions.

These projects have their own characteristics, but they all have the flaws of being incompatible with LoRA and not fully compatible with the Stable Diffusion ecosystem. In chronological order, the more important items are:

At this time, LCM-LoRA appeared: LoRA distilled into LCM with SD1.5, SSD1B, and SDXL would bring 5 times the generation acceleration to all SDXL models and be compatible with all existing LoRAs, while sacrificing a small part of the generation quality; The project quickly received support from a large number of plugins and distributions in the Stable Diffusion ecosystem.

LCM also publishes training scripts, which can support the training of its own LCM large models (such as LCM-SDXL) or LCM-LoRA, so as to achieve both quality and speed. With just one training session, you can speed up by up to 5 times while maintaining the quality of your build.

At this point, the LCM ecosystem has the prototype of a complete replacement for SD.

As of 2023/11/22, the following open source projects have been supported:

Items to add support to the plan:

With the gradual development of the ecosystem, LCM has the potential to be a complete replacement for Stable Diffusion as a new generation of image generation underneath.

Future outlook

Since the release of Stable Diffusion, the cost of image generation has been slowly optimized, and the emergence of LCM has directly reduced the cost of image generation by an order of magnitude. Every time a revolutionary technology emerges, it brings with it a wealth of opportunities to reshape the industry. LCM can bring significant changes to the industrial landscape in at least three aspects: the disappearance of image generation costs, video generation, and real-time generation.

1. Image generation costs disappear

On the To C product side, free of charge instead of charge. Constrained by the high cost of GPU computing power, a large number of Wensheng graph services represented by Midjourney choose freemium as their business model. LCM enables mobile phone clients, PC CPUs, browsers (WebAssembly), and CPU computing power that is easier to scale flexibly to meet the computing power needs of image generation in the future. Simple paid services such as Midjourney will be replaced by high-quality free services.

On the To B server, the reduced demand for generating computing power will be replaced by the increased demand for training computing power.

The demand for computing power for AI image generation services fluctuates greatly at peaks and valleys, and the idle time of purchased servers usually exceeds 50%. This feature has promoted the vigorous development of a large number of Function Compute GPUs, such as Replicate in the United States and Alibaba Cloud in China.

In terms of hardware virtualization, such as Rayvision and Tencent Cloud in China, they have also launched virtual desktop products related to image model training in the wave. As the generation power is delegated to the edge, client, or CPU computing power, which is easier to scale, AI graphics will be popularized in various application scenarios, and the demand for fine-tuning image models will increase significantly. In the field of graphics, professional, easy-to-use, and vertical model training services will become the main consumers of cloud GPU computing power in the next stage.

2. Wensheng Video

At present, the extremely high generation cost of Wensheng video restricts the development and popularization of technology, and consumer-grade graphics cards can only render at a slow speed frame by frame. A number of projects represented by the AnimateDiff WebUI plug-in have prioritized LCM support, enabling more people to participate in the open source project of Wensheng Video. The lower threshold will inevitably accelerate the popularity and development of Wensheng videos.

3分钟快速渲染:AnimateDiff Vid2Vid + LCM

3. Real-time rendering

The increase in speed has led to a plethora of new applications that are expanding the imagination of all.

RT-LCM vs. AR

Led by RealTime LCM, real-time video generation at about 10 frames per second has been achieved on consumer-grade GPUs for the first time, which is bound to have a far-reaching impact in the AR field.

At present, high-definition, low-latency capture and redrawing the entire scene in the line of sight requires extremely high computing power, so in the past, AR applications mainly focused on adding new objects and redrawing some objects in low definition after extracting features. LCM makes it possible to redraw entire scenes in real time, with unlimited room for imagination in games, interactive movies, social interactions, and more.

In the future, you don’t need to build a new one, so you can wear AR glasses and the streets will instantly transform into a neon-lit cyberpunk futuristic style for players to explore, and when you watch a futuristic interactive horror movie, you can wear AR glasses and everything familiar in your home will blend seamlessly into the scene, and the scary things will be hidden behind the bedroom door. The virtual and the real will merge seamlessly, making it increasingly difficult to distinguish between the real and the dream. And all of this is likely to have LCM at the bottom.

RT-LCM video rendering

交互方式 - 所想即所得(What you imagine is what you get)

The real-time image editing UI, which was first productized by Krea.ai and ilumine.ai, once again lowers the threshold of creation and expands the boundaries of creativity, allowing more people to obtain real-time feedback on the final painting on the basis of fine control.

Krea.ai real-time image editing

Real-time image editing

Modeling Software + LCM explores a new direction of 3D modeling, allowing 3D modelers to go one step further on the WYSIWYG basis and gain the ability to think what you get.

LCM real-time spatial modeling rendering

Hands are the most useless thing for humans because they can never keep up with the speed of the brain. What you see is what you get is too slow, and what you imagine is what you get will become the mainstream of creative work in the future.

For the first time, LCM allowed presentations to keep pace with the speed at which ideas were generated. New ways of interaction continue to emerge, and the end point of the AIGC revolution is to reduce the cost and technical threshold of creativity to infinitely close to zero. Regardless of industry, good ideas will go from scarcity to surplus. LCM takes us one step further into the future.

Welcome friends who are interested in LCM to join the LCM Chinese group:

Resources:

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)