OpenAI lidera una rara alianza entre los gigantes AMD, NVIDIA, Intel, Microsoft y Broadcom para dominar la capa de red de IA

robot
Generación de resúmenes en curso

Golden Finance reports that on May 7th, according to Kuai Technology, OpenAI officially released the MRC (Multi-Path Reliable Connection) protocol through the Open Compute Project (OCP), addressing GPU network communication bottlenecks in large-scale AI training. The protocol was jointly developed by OpenAI, AMD, NVIDIA, Intel, Microsoft, and Broadcom over two years and is currently in practical use in supercomputing clusters equipped with NVIDIA GB200.
The core issue MRC aims to solve is: during the training of large-scale AI models, a single data transfer delay can cause the entire training process to halt, with GPUs idling and waiting, and as the cluster size increases, delays caused by network congestion, link, and device failures become more frequent. MRC’s solution is to split a single 800Gb/s network interface into multiple smaller links, for example, connecting one interface to 8 different switches to build 8 independent 100Gb/s parallel networks, rather than relying on a single 800Gb/s network.

Ver original
Esta página puede contener contenido de terceros, que se proporciona únicamente con fines informativos (sin garantías ni declaraciones) y no debe considerarse como un respaldo por parte de Gate a las opiniones expresadas ni como asesoramiento financiero o profesional. Consulte el Descargo de responsabilidad para obtener más detalles.
  • Recompensa
  • Comentar
  • Republicar
  • Compartir
Comentar
Añadir un comentario
Añadir un comentario
Sin comentarios
  • Anclado