In the latest released SemiAnalysis InferenceMAX benchmark, Signal65 analyzed the inference performance of the Deepseek-R1 0528 mixture-of-experts (MoE) model, and the results showed that NVIDIA's GB200 NVL72 rack system significantly outperformed the AMD Instinct MI355X cluster of similar scale. The characteristic of a mixture-of-experts model is to activate the most suitable "expert" for each task type, which improves efficiency but may cause communication latency and bandwidth pressure between nodes when scaled up, becoming a computational bottleneck.

NVIDIA optimized the architecture of the GB200 NVL72 through its "extreme co-design" strategy. This system tightly interconnects 72 chips and is equipped with up to 30TB of shared memory, significantly improving data transfer efficiency and solving the latency issue. According to the test data, the GB200 NVL72 achieves a throughput of up to 75 tokens/second per GPU under similar configurations, with a performance 28 times that of the AMD MI355X.

For large-scale cloud computing companies, total cost of ownership (TCO) is a critical consideration. Signal65 pointed out, based on Oracle cloud pricing data, that the GB200 NVL72 not only has strong performance but also offers remarkable cost-effectiveness. Its relative cost per token is only one fifteenth of the AMD solution, and it provides a higher interaction rate.

Although NVIDIA dominates in the mixture-of-experts model field, AMD still has its competitive advantages. The report states that the AMD MI355X remains a competitive option in dense model environments due to its high-capacity HBM3e memory. Currently, AMD has not yet launched a new rack-level solution to counter the challenge of the GB200 NVL72. However, as the competition between the AMD Helios platform and NVIDIA's Vera Rubin platform intensifies, the competition in rack-level expansion solutions will become even fiercer.

Key Points:   

🟢 NVIDIA's GB200 NVL72 performance is 28 times that of the AMD MI355X, showing significant leadership.   

🟢 The GB200 NVL72 solves the data transmission latency issue through optimized architecture and high-speed shared memory.   

🟢 Although NVIDIA has an advantage, AMD still has competitiveness in the dense model field, and future competition will be more intense.