Translated data: ByteDance and Peking University have successfully built a massive cluster of ten thousand GPUs, integrating the MegaScale system, which completed the training of the large-scale GPT-3 model in just 1.75 days. This system achieved a computational efficiency of 55.2%, surpassing NVIDIA's Megatron-LM. To enhance efficiency and stability, they made improvements in algorithms, communication overlap, and operator optimization. ByteDance currently maintains a GPU cluster with over ten thousand cards and is in the process of constructing a large-scale Hopper architecture cluster.