DermLIP is a vision-language model specifically designed for the field of dermatology, trained on the largest dermatology image-text corpus, Derm1M. This model adopts a CLIP-style architecture and can perform various dermatology-related tasks, including zero-shot classification, few-shot learning, cross-modal retrieval, and concept annotation.
Multimodal
TransformersEnglish