Stable Video Diffusion (SVD) 1.1 Image-to-Video is a diffusion model that generates videos corresponding to static images as conditioning frames. This latent diffusion model is trained to generate short video clips from images. At a resolution of 1024x576, the model is trained to generate 25-frame videos using the same-sized context frames and is fine-tuned from SVD Image-to-Video [25 frames]. During fine-tuning, conditions like 6FPS and Motion Bucket Id 127 are fixed to improve output consistency without adjusting hyperparameters.