Ali's Black Tech Shocks the Scene! A 0.6B Small Model is Modified into a 17B MoE with Only 5% Activated Parameters, Running Directly on CPU at 30 Token/s!
The Ali International Digital Commerce team launched the Marco-Mini-Instruct model, which has 17.3B parameters and only 0.86B activated parameters, offering high inference efficiency and smooth operation on regular CPUs. With 8-bit quantization and four DDR4 2400 memory modules, the inference speed reaches about 30 token/s, promoting the practical application of the MoE architecture.