pipegoose
PublicLarge scale 4D parallelism pre-training for ? transformers in Mixture of Experts *(still work in progress)*
3d-parallelismdata-parallelismdistributed-optimizershuggingface-transformerslarge-scale-language-modelingmegatronmegatron-lmmixture-of-expertsmodel-parallelismmoe
Creat:2023-06-14T14:14:50
Update:2024-12-21T01:24:28
87
Stars
1
Stars Increase