relu-revival-normfree
PublicPyTorch implementation of normalization-free LLMs investigating entropic behavior to find desirable activation functions
attention-weentropy-collapsegelugpt-2leaky-relullm-architecturellm-evaluationllm-inferencemodel-optimizationnormalization-free-training
Creat:2024-10-26T03:34:51
Update:2024-11-03T04:42:01
https://arxiv.org/abs/2410.09637
0
Stars
0
Stars Increase