BERT-pre-training
Publicmulti-gpu pre-training in one machine for BERT without horovod (Data Parallelism)
Creat:2018-12-25T15:45:36
Update:2025-03-15T22:54:03
172
Stars
0
Stars Increase
multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism)