2024-09-05 16:10:05.AIbase.11.6k
Llama3 Transforms into Mamba in Just 3 Days! Inference Speed Improved by 1.5 Times
The Mamba team's research focuses on distilling the large transformer model Llama into Mamba by designing a novel inference decoding algorithm that significantly enhances inference speed. The research aims to leverage Llama's extensive knowledge while reducing the high cost of training large models from scratch. The research team successfully transformed Zephyr-7B and Llama-38B into a linear RNN by combining progressive distillation, supervised fine-tuning, and preference optimization methods.