Today, Alibaba officially launched the unified large model for image generation and editing Wan2.7-Image. This model not only achieved a qualitative leap in visual effects but also broke the limitations of traditional AI-generated images, such as "standard faces" and "difficulty in instruction alignment," through comprehensive capability upgrades.
Say goodbye to AI faces and step into the "one person, one face" era
The Wan2.7-Image significantly enhances the virtual character customization function. Users can customize from bone structure, eyes, to subtle facial features, precisely controlling specific characteristics like oval face, phoenix eyes, and deep eye sockets. This advancement has completely ended the mechanical uniformity of AI-generated portraits, achieving true personal expression.

"Color Palette" and "Print-Quality" Text Rendering
In terms of artistic expression, the model now supports the **"Color Palette" feature**, allowing users to extract the color proportions of a reference image (such as Matisse's red series or Van Gogh's yellow series) with one click and accurately transfer them to new works. Additionally, Wan2.7-Image demonstrates remarkable performance in long text rendering, supporting up to 3K token input, and can stably output an entire A4 page of content containing complex formulas and tables, reaching print-quality standards and supporting 12 languages.

Interactive Editing and Multi-Subject Consistency
The model has powerful interactive editing capabilities, supporting alignment, movement, or replacement of elements through precise selection. For example, users can select characters in the image to swap positions or replace ice cubes with fruits, achieving pixel-level control. At the same time, the model supports up to 9-image multi-subject consistency, maintaining consistent style and characteristics when generating AI girl groups or furniture combinations.

Underlying Technological Breakthroughs and Industry Empowerment
The Wan2.7-Image adopts a leading unified architecture for generation and understanding, achieving semantic mapping in a shared latent space. This means the model no longer guesses text to fit pixels, but instead possesses underlying semantic understanding. Currently, this model has been simultaneously launched with the Wan2.7-Image-pro version, offering more stable composition and accurate understanding.

This model is now widely applied in short drama production (one person playing multiple roles), e-commerce advertising (one model picture for multiple uses), education research, and social entertainment fields. Users can call the API via




