Custom-Transformer-Pytorch
PublicA clean, ground-up implementation of the Transformer architecture in PyTorch, including positional encoding, multi-head attention, encoder-decoder layers, and masking. Great for learning or building upon the core model.