Compressed-Transformers
PublicIn this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization aware training of the linear layers and demonstrate the performance for 8 bits, 4 bits, 2 bits and 1 bit (binary) quantization.
Creat:2020-11-07T20:28:13
Update:2025-03-19T03:34:52
24
Stars
0
Stars Increase