Network Innovation in the AI Era: Cost Reduction, Openness, and Computing Power Balance Become Key

robot
Abstract generation in progress

The Importance of Networks and Innovative Directions in the AI Era

The network has become a key link in the era of large AI models. With the rapid growth of model scale, multi-server clusters have become the mainstream solution to training issues, which is also the basis for the network to “rise” in the AI era. Compared to the past when data was simply transmitted, today networks are more used for synchronizing model parameters between graphics cards, which places higher demands on network density and capacity.

Network demand mainly comes from three aspects:

  1. The ever-growing model size. The training time is proportional to the number of model parameters and inversely proportional to the computing speed. To shorten the training time, it is necessary to increase the number of devices through the network and improve the parallel efficiency of multiple devices.

  2. The complex communication of multi-card synchronization. After the model is split to a single card, each computation requires alignment, which poses higher demands on network transmission and exchange.

  3. Increasingly expensive failure costs. Large model training often lasts for months, and interruptions can result in significant progress and cost losses. Modern AI networks have become the crystallization of human systems engineering capabilities comparable to airplanes, aircraft carriers, and more.

Future network innovations will revolve around three directions: “cost reduction”, “openness”, and balancing computing power scale:

  1. Communication Medium Replacement: While optical modules pursue higher speeds, they are also exploring cost-reduction routes such as LPO, LRO, and silicon photonics. Copper cables occupy the cabinet connections due to their cost-performance advantages. New semiconductor technologies like Chiplet and Wafer-scaling are accelerating the exploration of silicon-based interconnect limits.

  2. Network Protocol Competition: Inter-chip communication protocols are strongly bound to graphics cards, such as NV-LINK and Infinity Fabric. Communication between nodes mainly focuses on the competition between IB and Ethernet protocols.

  3. Changes in Network Architecture: Currently, the leaf-spine architecture is widely adopted, but as the scale of clusters increases, new architectures such as Dragonfly and rail-only are expected to become the evolutionary direction for ultra-large clusters.

Investment advice to focus on:

Core links of communication systems: Zhongji Xuchuang, Xinyi Sheng, Tianfu Communication, Industrial Fulian, Yingweike, Hudian Co., Ltd.

Innovation links in communication systems: Changfei Optical Fiber, Taichenglight, Yuanjie Technology, Shengke Communication-U, Cambrian, Dekoli.

Risk warning: AI demand is below expectations, scaling law is failing, and industry competition is intensifying.

ETH-3.37%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 9
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned