Small Model Triumph! HKUST and Kuaishou Jointly Develop Evolutionary Search Technology, Letting AI Art Generation Move Beyond 'Brawn Over Brains'

In the field of AI-generated art, there has long been a common belief that to generate high-quality images and videos, larger models, more parameters, and stronger computing power are required. However, a recent research team from the Hong Kong University of Science and Technology and Kuaishou Technology has proposed the EvoSearch (evolutionary search) technology, which is completely overturning this conventional notion.

The most shocking performance of this technology is: an 865M-parameter Stable Diffusion 2.1 model, after using EvoSearch, has generated quality surpassing the powerful GPT-4; while a 1.3B-parameter Wan model paired with EvoSearch can even match a 14B-parameter model ten times its size.

Challenges of Existing AI Generation Models

The current mainstream AI generation models mainly fall into two categories: diffusion models and flow models. Diffusion models generate clear images by gradually removing noise, similar to the process of making a blurry photo clearer; flow models directly transform random noise into target images through a series of smooth transformations.

To improve the performance of these models, the industry generally adopts two strategies. One is to continuously increase the model size and feed in more data during the training phase, but this "miracle by brute force" method is extremely costly and has already approached the resource limit. The other is optimization during the inference phase, including Best-of-N sampling (generate N images and select the best one) and particle sampling (maintain multiple candidate solutions and screen out excellent individuals), among other methods.

However, these existing methods all have obvious flaws: the Best-of-N method is inefficient, wasting a lot of computation on generating "waste"; the particle sampling method is too conservative, easily getting stuck in local optima, lacking active exploration capabilities; other fine-tuning methods either require additional training or tend to result in generated samples lacking diversity.

EvoSearch: The "Theory of Evolution" in the Field of AI Art

The core innovation of EvoSearch lies in introducing Darwin's theory of evolution into the AI generation process. This method views image generation as a species evolution process: first generating initial "populations" (random noise), then scoring semi-finished products through "fitness assessment", followed by "survival of the fittest" to select excellent individuals, and finally producing new candidate solutions through specially designed "mutation" operations.

This mutation operation is the key technological breakthrough of EvoSearch. For the initial noise, the system achieves mutations by adding an appropriate amount of Gaussian noise; for intermediate states during the denoising process, it introduces controllable perturbations by referencing the randomness injection method in stochastic differential equation sampling. This design enables both exploration of new areas and retention of excellent "genes".

Compared to traditional methods, EvoSearch has three major advantages: proactive exploration rather than passive screening, enabling it to jump out of the initial candidate pool constraints; effectively balancing exploration and exploitation, avoiding premature convergence to local optima; strong generalization, applicable to various diffusion models and flow models without modifying the model structure or requiring additional training.

Experimental Results: A Comprehensive "Demotion Strike"

The research team conducted comprehensive tests on image and video generation tasks, showing that EvoSearch significantly outperforms existing baseline methods in all indicators.

In terms of image generation, as the inference computational load increases, the quality and text matching degree of images generated by EvoSearch continue to steadily improve, while other methods quickly reach bottlenecks. For complex or ambiguous prompts, EvoSearch can more accurately understand and generate pictures that meet the requirements, simultaneously showcasing richer diversity in aspects such as background and posture.

The performance in video generation is even more impressive. Regardless of whether the Wan1.3B model or the HunyuanVideo13B model is used, the generation quality of EvoSearch significantly surpasses the baseline methods. What is most impressive is that when the Wan1.3B model is allocated the same inference time budget as the Wan14B model, the combination of the former with EvoSearch can match or even surpass the latter.

It is worth noting that even when the evaluation metrics do not fully align with the reward functions used during EvoSearch searches, this method still demonstrates good generalization ability and is less likely to be misled by specific reward functions. In human evaluations, the videos generated by EvoSearch received higher win rates in visual quality, action quality, text alignment, and overall quality.

Technical Insights and Future Prospects

The success of EvoSearch brings important insights to the AI generation field. First, in today's increasingly expensive training costs, investing more computation during the inference phase to enhance model performance is a highly valuable exploration path. Second, introducing the selection and variation ideas from biological evolution into the AI generation field can effectively overcome the limitations of traditional search methods.

More importantly, the success of this technology depends on a deep understanding of the denoising processes of diffusion and flow models. EvoSearch truly grasps the state space structural characteristics of these models during the denoising process, designing targeted mutation strategies accordingly, thereby enabling more effective exploration of vast possibilities.

Of course, there is room for further optimization of EvoSearch. The research team points out that future improvement directions include designing smarter mutation strategies and better balancing exploration and computational efficiency.

This technology shows us an important trend: even without blindly pursuing larger models and more training data, we can still tap into the deeper potential of AI models by applying smarter search strategies during the inference phase. EvoSearch is opening up the "intelligent evolution" era of AI creation, allowing small models to create stunning works.

Project homepage: https://tinnerhrhe.github.io/evosearch/

Code: https://github.com/tinnerhrhe/EvoSearch-codes

Paper: https://arxiv.org/abs/2505.17618

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Small Model Triumph! HKUST and Kuaishou Jointly Develop Evolutionary Search Technology, Letting AI Art Generation Move Beyond 'Brawn Over Brains'

AIbase基地

This article is from AIbase Daily

AI News Recommendations

The Hong Kong Polytechnic University and Ant Digital Establish a Joint Lab for AI+Web3

Douchou Gynecology Large Model Becomes the First in the Industry, Start-up Company + DingTalk Develops a Professional-Level AI

Canva Partners with Claude AI for a Major Upgrade! Chat in One Click and Turn into Professional Design, Creative Efficiency Soars

AI Daily: Zhipu AI Unveils GLM-4.5; Alibaba Open Sources Wan2.2; Jiequ Star Launches New Model Step3

Meta Smartwatch Restart Project Exposed: Equipped with a Camera or Used in Conjunction with AI Glasses

Microsoft Edge Browser Upgrade! Copilot Mode Unlocks New AI Experiences - Multi-Tab RAG + Visual Assistance Launches

Mistral AI releases environmental impact analysis of artificial intelligence model, revealing challenges in sustainable development

Google Chrome Launches AI Store Reviews to Help U.S. Consumers Make Shopping Decisions

Cyberspace Administration of China Launches Campaign to Combat False Information Posted by Self-Media, Platforms Required to Optimize AI-Generated Content Labeling

Startup Julius AI Raises $10 Million in Seed Funding