ModuleFormer
PublicModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.
Creat:2023-08-25T00:44:48
Update:2025-03-16T14:12:41
223
Stars
0
Stars Increase