Today, the Qwen team officially released the new image generation model Qwen-Image-Layered. This model breaks through traditional AI image editing limitations with its self-developed innovative architecture. Through "layer decomposition" technology, it gives static images the ability to be edited, ushering in a new era of precise editing where you can edit any part you point to.

QQ20251222-155312.png

Currently, there are two major pain points in AI image editing: global editing often disrupts the consistency of unmodified areas, and mask-based local editing struggles with occlusions and blurred boundaries. Qwen-Image-Layered innovatively proposes the idea of "image decoupling," automatically decomposing images into semantic-independent RGBA layers in an "onion-peeling" manner. Each layer has its own color (RGB) and transparency (Alpha), and can be edited independently without affecting other layers.

The core highlights of the model are significant: the new RGBA-VAE technology allows RGB images and RGBA layers to "communicate" within the same latent space, solving issues such as uneven layer distribution and blurred boundaries; the VLD-MMDiT architecture supports flexible processing from 3 to more than 10 layers, with collaboration between layers through attention mechanisms, eliminating the need for inefficient recursive decomposition; after going through multiple stages of evolution including "generating single images - generating multiple layers - decomposing any RGB image," it achieves a transformation from generation capability to understanding capability.

In terms of application, this model can perform operations such as recoloring, object replacement, text modification, element deletion, and scaling and moving. More notably, it supports variable numbers of layers for decomposition, allowing the same image to be split into 3 or 8 layers as needed, and any layer can be further recursively decomposed, achieving infinite-level refinement.

Currently, the technical report, code repository, and Demo of Qwen-Image-Layered have been launched on platforms such as arXiv, Github, and ModelScope. The Qwen team stated that they hope this model will restructure images into editable layers, providing users with intuitive, accurate, and robust image editing capabilities.

Technical Report:

https://arxiv.org/abs/2512.15603

Github: 

https://github.com/QwenLM/Qwen-Image-Layered 

ModelScope: 

https://www.modelscope.cn/models/Qwen/Qwen-Image-Layered

Hugging Face: 

https://huggingface.co/Qwen/Qwen-Image-Layered

Demo: 

https://www.modelscope.cn/studios/Qwen/Qwen-Image-Layered