Slow video generation speed and high cost have long been pain points in the AIGC field. This time, the Tencent Hunyuan team has provided a new solution.

The Tencent Hunyuan team has officially open-sourced a new video generation acceleration solution called DisCa, with both code and model weights now publicly available. This work has been accepted by CVPR 2026, the top conference in computer vision, and is also the first attempt in academia and industry to explore learnable feature caching acceleration technology on a distilled, few-step model.

image.png

The core idea of DisCa is to further reduce the inference cost on models that have already undergone distillation and have very few inference steps. Traditional feature caching schemes perform well on multi-step generation models, but directly applying them to few-step distilled models can lead to large cache errors, causing the generated results to break down. DisCa solves this by introducing a lightweight neural network predictor, trained through adversarial learning, which learns to accurately predict the evolution trajectory of subsequent features based on cached features. This allows the acceleration boundary to be extended to 11.8 times while maintaining generation quality.

image.png

Another noteworthy direction is R-MeanFlow. The MeanFlow approach from the MIT team led by He Kaiming performs well in image generation, but the Tencent Hunyuan team found that applying it directly to more complex video generation tasks leads to overly aggressive "one-step generation" goals, which negatively affect model training. Their improvement idea is simple and straightforward: since one-step generation is not yet feasible, they remove the aggressive scenarios during training, constraining the step range within a reasonable interval. This conclusion aligns with concurrent research from the MIT and Google teams, and the related achievements have been applied in the actual training of the current best open-source video generation model, HunyuanVideo-1.5.