VisionTransformer
PublicTensorFlow and PyTorch Implementation of Vision Transformers (ViT) as described by "An Image is Worth 16x16 words: Transformers for Image Recognition at Scale - Alexey Dosovitskiy et al."
Creat:2021-06-25T18:02:54
Update:2024-12-15T19:04:14
0
Stars
0
Stars Increase