This is a vision Transformer model based on the DINOv3 architecture, using a small configuration and trained through knowledge distillation on the LVD-1689M dataset. This model is specifically designed for efficient image feature extraction and supports various computer vision tasks such as image classification, feature map extraction, and image embedding.
Computer Vision
Timm