In the field of image editing, a revolutionary technology is changing the game! TuZhan Intelligent and the UniWorld team from Peking University have introduced a new generation of image editing model — UniWorld-V2. This model not only surpasses Nano Banana in detail control for image processing, but also shows excellent performance in understanding Chinese instructions.

UniWorld-V2 is based on an innovative visual reinforcement learning framework — UniWorld-R1, which is the first to apply reinforcement learning strategy optimization to image editing, significantly improving the accuracy and flexibility of editing. Compared with traditional supervised fine-tuning methods, the design of UniWorld-R1 aims to solve the problems of data overfitting and poor generalization ability, allowing the model to better respond to diverse editing instructions.

image.png

For example, when users ask AI to change a girl's gesture to "OK", UniWorld-V2 can accurately understand and modify it. In contrast, Nano Banana failed to capture the user's intention accurately. More astonishingly, in the poster editing example, UniWorld-V2 can render complex Chinese artistic fonts, such as "Moon Full Mid-Autumn", ensuring clear effects and accurate semantics.

The model's refined control capabilities are also remarkable. Through simple box selection operations, users can specify the editing area and achieve high-level adjustments, such as moving specific objects outside the box. In addition, UniWorld-V2 can also show excellent performance in light and shadow processing, naturally integrating objects into the scene and enhancing the overall harmony.

image.png

In the testing benchmarks GEdit-Bench and ImgEdit, UniWorld-V2 leads other well-known models, such as OpenAI's GPT-Image-1 and Gemini2.0, with high scores of 7.83 and 4.49 respectively. These achievements are backed by the powerful versatility of the UniWorld-R1 framework, which not only enhances the performance of UniWorld-V2, but also brings significant improvements to other models.

The paper, code, and model of UniWorld-R1 are publicly available on GitHub and Hugging Face, laying the foundation for future research. The release of this technology not only promotes the development of the multimodal field, but also brings new possibilities to image editing technology.

image.png

Paper address:

https://arxiv.org/abs/2510.16888

GitHub link:

https://github.com/PKU-YuanGroup/UniWorld