By the end of 2023, it is predicted that the coming year will be a period of rapid development in video generation. One of the core technologies of Sora is the conversion of visual data into a unified patch representation, which, when combined with Transformer and diffusion models, exhibits outstanding scaling characteristics. The Tsinghua team is highly promising in the field of video generation and is expected to be the Chinese team closest to Sora. The emergence of Sora has led many researchers to worry about whether the gap in AI technology between China and other countries is widening. In the future development of the video generation field, what are the challenging issues that need to be addressed, and what commercial opportunities will Sora bring?