2025-02-12 14:04:43.AIbase.15.3k
ByteDance's UltraMem Architecture Reduces Large Model Inference Costs by 83%
The ByteDance Doubao large model team announced today the successful development of a new sparse model architecture called UltraMem. This architecture effectively addresses the high memory access issues during the inference of MoE (Mixture of Experts) models, improving inference speed by 2 to 6 times compared to MoE, and reducing inference costs by up to 83%. This groundbreaking advancement opens a new path for efficient inference of large models. The UltraMem architecture successfully resolves the memory bottleneck during inference of MoE architectures while maintaining model performance. Experimental results show that the parameters and activation conditions are the same.