The translated data: DragNUWA is a video generation model based on diffusion algorithms, designed to address the issue of fine-grained control in video generation. This model introduces text, image, and trajectory information, providing fine control from semantic, spatial, and temporal perspectives. Experiments have demonstrated that the DragNUWA model excels in fine-grained control for video generation.