Image-Captioning-with-ViT-and-BERT
PublicA concise image-captioning pipeline that fine-tunes a ViT encoder with a BERT decoder on Flickr8K for training, plus a standalone script to load the trained model and generate captions on new images.
artificial-intelligencebertcomputer-visiondeep-learningdeeplearningfine-tuningflickr8k-datasethuggingfacehuggingface-transformersimage-captioning
Creat:2025-05-25T20:30:33
Update:2025-05-25T21:57:58
1
Stars
0
Stars Increase