On August 12, Huawei will release its breakthrough AI inference innovation technology, UCM (Inference Memory Data Manager), at the 2025 Financial AI Inference Application Implementation and Development Forum. This technology is expected to reduce China's reliance on HBM (High Bandwidth Memory) for AI inference and significantly improve the performance of large models in China.

UCM is centered around KV Cache, integrating multiple cache acceleration algorithm tools. By managing the memory data generated during inference in a hierarchical manner, it expands the context window, achieving high throughput, low latency inference experiences, and reducing the cost per Token. This solution can alleviate issues such as task stagnation and response delays caused by insufficient HBM resources.

Large Model Metaverse (2)

At this forum, Huawei will jointly announce the latest application achievements of AI inference with China UnionPay. Experts from institutions such as the China Academy of Information and Communications Technology, Tsinghua University, and iFlytek will also share their practical experiences in accelerating and optimizing large model inference. Fan Jie, Vice President of Huawei's Data Storage Product Line, stated that future AI breakthroughs will highly depend on the release of high-quality industry data. High-performance AI storage can shorten the data loading time from hours to minutes, increasing the efficiency of computing clusters from 30% to 60%.

Industry analysts believe that the release of UCM comes at a critical moment when the AI industry is shifting from "pursuing the limits of model capabilities" to "pursuing the optimization of inference experience." The inference experience has become an important standard for measuring the commercial value of AI. Great Wall Securities pointed out that as the capabilities of large models continue to improve and commercial scenarios expand, companies in the computing power and industrial chain are expected to seize new development opportunities.