In the field of AI-generated art, there has long been a common belief that to generate high-quality images and videos, larger models, more parameters, and stronger computing power are required. However, a recent research team from the Hong Kong University of Science and Technology and Kuaishou Technology has proposed the EvoSearch (evolutionary search) technology, which is completely overturning this conventional notion.
The most shocking performance of this technology is: an 865M-parameter Stable Diffusion 2.1 model, after using EvoSearch, has generated quality surpassing the powerful GPT-4; while a 1.3B-parameter Wan model paired with EvoSearch can even match a 14B-parameter model ten times its size.
Challenges of Existing AI Generation Models
The current mainstream AI generation models mainly fall into two categories: diffusion models and flow models. Diffusion models generate clear images by gradually removing noise, similar to the process of making a blurry photo clearer; flow models directly transform random noise into target images through a series of smooth transformations.
To improve the performance of these models, the industry generally adopts two strategies. One is to continuously increase the model size and feed in more data during the training phase, but this "miracle by brute force" method is extremely costly and has already approached the resource limit. The other is optimization during the inference phase, including Best-of-N sampling (generate N images and select the best one) and particle sampling (maintain multiple candidate solutions and screen out excellent individuals), among other methods.
However, these existing methods all have obvious flaws: the Best-of-N method is inefficient, wasting a lot of computation on generating "waste"; the particle sampling method is too conservative, easily getting stuck in local optima, lacking active exploration capabilities; other fine-tuning methods either require additional training or tend to result in generated samples lacking diversity.
EvoSearch: The "Theory of Evolution" in the Field of AI Art
The core innovation of EvoSearch lies in introducing Darwin's theory of evolution into the AI generation process. This method views image generation as a species evolution process: first generating initial "populations" (random noise), then scoring semi-finished products through "fitness assessment", followed by "survival of the fittest" to select excellent individuals, and finally producing new candidate solutions through specially designed "mutation" operations.
This mutation operation is the key technological breakthrough of EvoSearch. For the initial noise, the system achieves mutations by adding an appropriate amount of Gaussian noise; for intermediate states during the denoising process, it introduces controllable perturbations by referencing the randomness injection method in stochastic differential equation sampling. This design enables both exploration of new areas and retention of excellent "genes".
Compared to traditional methods, EvoSearch has three major advantages: proactive exploration rather than passive screening, enabling it to jump out of the initial candidate pool constraints; effectively balancing exploration and exploitation, avoiding premature convergence to local optima; strong generalization, applicable to various diffusion models and flow models without modifying the model structure or requiring additional training.
Experimental Results: A Comprehensive "Demotion Strike"
The research team conducted comprehensive tests on image and video generation tasks, showing that EvoSearch significantly outperforms existing baseline methods in all indicators.
In terms of image generation, as the inference computational load increases, the quality and text matching degree of images generated by EvoSearch continue to steadily improve, while other methods quickly reach bottlenecks. For complex or ambiguous prompts, EvoSearch can more accurately understand and generate pictures that meet the requirements, simultaneously showcasing richer diversity in aspects such as background and posture.
The performance in video generation is even more impressive. Regardless of whether the Wan1.3B model or the HunyuanVideo13B model is used, the generation quality of EvoSearch significantly surpasses the baseline methods. What is most impressive is that when the Wan1.3B model is allocated the same inference time budget as the Wan14B model, the combination of the former with EvoSearch can match or even surpass the latter.
It is worth noting that even when the evaluation metrics do not fully align with the reward functions used during EvoSearch searches, this method still demonstrates good generalization ability and is less likely to be misled by specific reward functions. In human evaluations, the videos generated by EvoSearch received higher win rates in visual quality, action quality, text alignment, and overall quality.
Technical Insights and Future Prospects
The success of EvoSearch brings important insights to the AI generation field. First, in today's increasingly expensive training costs, investing more computation during the inference phase to enhance model performance is a highly valuable exploration path. Second, introducing the selection and variation ideas from biological evolution into the AI generation field can effectively overcome the limitations of traditional search methods.
More importantly, the success of this technology depends on a deep understanding of the denoising processes of diffusion and flow models. EvoSearch truly grasps the state space structural characteristics of these models during the denoising process, designing targeted mutation strategies accordingly, thereby enabling more effective exploration of vast possibilities.
Of course, there is room for further optimization of EvoSearch. The research team points out that future improvement directions include designing smarter mutation strategies and better balancing exploration and computational efficiency.
This technology shows us an important trend: even without blindly pursuing larger models and more training data, we can still tap into the deeper potential of AI models by applying smarter search strategies during the inference phase. EvoSearch is opening up the "intelligent evolution" era of AI creation, allowing small models to create stunning works.
Project homepage: https://tinnerhrhe.github.io/evosearch/
Code: https://github.com/tinnerhrhe/EvoSearch-codes
Paper: https://arxiv.org/abs/2505.17618