On May 6, leading artificial intelligence company OpenAI, together with industry giants such as AMD, Broadcom, Intel, Microsoft, and NVIDIA, jointly launched a new open network protocol called "Multipath Reliable Connection" (MRC). This move marks a critical step in the tech industry's efforts to address efficiency bottlenecks in ultra-large-scale AI clusters.

The core objective of this protocol is to optimize the performance of large AI training clusters through technological means. In previous model training, expensive GPU computing power often remained idle due to network fluctuations or uneven distribution, causing significant resource waste. The MRC protocol aims to significantly improve the stability of data transmission by providing a more reliable multipath connection solution, thereby greatly reducing power consumption and improving overall computing efficiency.

According to the information, the MRC protocol is not just at the theoretical stage but has already been fully applied within OpenAI. All large supercomputers used for developing cutting-edge models have deployed this protocol, including the Oracle Cloud Infrastructure (OCI) site located in Abilene, Texas, USA, and Microsoft's Fairwater supercomputer cluster.

As the scale of AI model parameters continues to grow, optimizing the efficiency of underlying infrastructure has become a new battlefield among major companies. By choosing to jointly launch an open protocol with multiple hardware chip manufacturers and cloud service providers, OpenAI not only aims to address its own training costs but also demonstrates its intention to lead the network communication standards in the AI era. For the industry, the release of MRC may drive ultra-large-scale computing clusters into a more efficient and greener new phase.