Huawei FlashComm Technology Boosts Inference Speed of Large Models by 80%

AIbase基地

Published inAI News · 4 min read · May 22, 2025

86

In the wave of global artificial intelligence development, the speed and efficiency of model inference have become focal points. Recently, Huawei's mathematical team introduced a new technology called FlashComm during the DeepSeek open-source cycle. This technology aims to significantly enhance the performance of large model inference through three innovative measures, with up to an 80% increase in speed.

Firstly, FlashComm technology focuses on optimizing the AllReduce communication operation. Traditional AllReduce methods are like container trucks carrying full loads, lacking flexibility. Huawei's team intelligently divides the data into two parts: first performing ReduceScatter, then AllGather. This reorganization process reduces subsequent communication by 35% and key computational volume to 1/8 of the original, improving inference performance by 22% to 26%.

Secondly, during the inference process, Huawei discovered that adjusting the parallel dimensions of matrix multiplication can alleviate communication burdens. By flattening three-dimensional tensors into two-dimensional matrices while maintaining result accuracy, combined with INT8 quantization technology, the amount of data transmission drops by 86%, and overall inference speed increases by 33%. This strategy is akin to loading large goods into smaller containers, making data transmission more efficient.

Finally, Huawei's multi-stream parallelism technology breaks the limitations of traditional serial computing. During the inference process of MoE models, Huawei's team dissects and reorganizes complex calculation workflows, achieving precise parallelism among three computational streams with Ascend hardware's multi-stream engines. This method enables one group of data to enter the expert calculation stage while another group simultaneously proceeds to the gating decision stage, maximizing computational efficiency.

The release of FlashComm marks a significant technical breakthrough for Huawei in the field of large model inference. It not only enhances the speed of model inference but also propels the development of AI applications, creating new opportunities for AI applications in scientific research and industrial fields.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Huawei FlashComm Technology Boosts Inference Speed of Large Models by 80%

AIbase基地

This article is from AIbase Daily