MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention
PublicAn efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.
Creat:2025-04-08T23:10:50
Update:2025-04-09T06:41:35
10
Stars
0
Stars Increase