Alibaba Tongyi Qianwen Open Sources New Text-to-Image Model Qwen-Image

AIbase基地

Published inAI News · 5 min read · Aug 5, 2025

The Qwen-Image, a 2 billion parameter multimodal diffusion transformer (MMDiT) image generation foundation model, is the first in the Qwen series to be open-sourced. This innovative achievement has made breakthroughs in complex text rendering and precise image editing, and has demonstrated outstanding performance on multiple public benchmarks, becoming a rising star in the field of image generation and editing.

Qwen-Image stands out with its powerful text rendering capabilities, supporting multi-line layout, paragraph-level text generation, and fine-grained detail presentation. Whether in English or Chinese, it can achieve high-fidelity output. For example, when rendering anime scenes in the style of Studio Ghibli, the model can accurately present shop signs, character postures, and expressions, and even small texts on wine barrels are clearly visible. Similarly, in rendering Chinese couplets, Qwen-Image not only accurately draws the left and right couplets and the horizontal scroll but also skillfully integrates calligraphy effects, which is astonishing.

微信截图_20250805080614.png

In terms of English text rendering, Qwen-Image also performs excellently. Whether it's the information displayed in bookstore windows or complex infographics, the model can accurately generate text content and skillfully integrate it into the overall composition, demonstrating a high level of artistry and informativeness. More impressively, even when handling smaller or more text, Qwen-Image maintains a high level of accuracy and clarity, such as accurately generating long passages of text on a piece of paper held in hand, or fully presenting handwritten paragraphs on a glass plate.

Aside from text rendering, Qwen-Image also demonstrates extraordinary strength in image editing. Through an enhanced multi-task training paradigm, the model can maintain consistency during the editing process, supporting various operations such as style transfer, object addition/removal, detail enhancement, and adjustment of human poses. This enables ordinary users to easily achieve professional-level image editing, significantly lowering the technical barrier for visual content creation.

On multiple public benchmarks, Qwen-Image's performance is remarkable. From general image generation benchmarks like GenEval, DPG, and OneIG-Bench, to image editing benchmarks like GEdit, ImgEdit, and GSO, Qwen-Image has achieved state-of-the-art performance, demonstrating its comprehensive advantages in image generation and editing. Particularly in Chinese text rendering, Qwen-Image greatly surpasses existing state-of-the-art models, highlighting its unique position as an advanced image generation model.

Currently, Qwen-Image is open-sourced on platforms such as ModelScope, Hugging Face, and GitHub, and provides detailed Technical reports and Demo demonstrations. Users can visit QwenChat (chat.qwen.ai) and select the "image generation" feature to experience the power of this model firsthand.

ModelScope:https://modelscope.cn/models/Qwen/Qwen-Image

Hugging Face:https://huggingface.co/Qwen/Qwen-Image

GitHub:https://github.com/QwenLM/Qwen-Image

Technical report:https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf

Demo: https://modelscope.cn/aigc/imageGeneration?tab=advanced

5 Billion Qwen Helped Me! Alibaba Qwen Spring Festival Event Over 130 Million People Participated in AI Life Services

During Alibaba's Qwen App Spring Festival event, over 130 million users utilized AI assistants for services like ordering milk tea and stocking up on New Year goods, with 'Qwen Help Me' used 5 billion times, integrating AI deeply into holiday consumption. Post-launch, AI-driven movie ticket purchases saw significant growth.....

Qwen3.5-Plus Open-Sourced on the Eve of Chinese New Year, Ranking as the World's Strongest Open-Source Large Model

On the eve of Chinese New Year in 2026, Alibaba opened-source the new generation large model Qwen3.5-Plus, whose performance rivals that of Gemini3Pro, becoming the world's strongest open-source large model. The model adopts a revolutionary underlying architecture, with 397 billion parameters but only 17 billion activated, surpassing the Qwen3-Max with trillions of parameters at a smaller scale. The deployment memory usage is reduced by 60%, and the long context reasoning throughput is increased by 19 times. The API cost is as low as 0.8 yuan per million Tokens, just 1/18th of Gemini3Pro.

ByteDance Launches Seedream 5.0 Lite: A New Benchmark for Image Creation with Visual Reasoning and Real-Time Networking Capabilities

The Seed team of ByteDance has launched the Seedream 5.0 Lite intelligent image creation model. The core breakthrough lies in adopting a multimodal unified architecture, achieving a leap from executing instructions to deeply understanding creative intentions. The new model emphasizes logical understanding and visual reasoning capabilities, positioning itself as a smarter and more professional visual creative partner.

Qwen and Ant Afu Downloads Surge, Ranking First and Second on Apple App Store

Recently, Qwen and Ant Afu saw a significant increase in downloads due to activities such as tea order placement and "Health Afu." They now occupy the top two positions on the free app chart of the App Store in China. Ant Afu has also partnered with CCTV and connected to over 5,000 medical institutions nationwide, showing strong growth momentum.

Buy a flight ticket with just one sentence! China Eastern Airlines partners with Alibaba Qwen to open the country's first consumer-level AI flying era

China Eastern Airlines partners with Alibaba's Qianwen and Fliggy to become the first airline in China to fully integrate consumer AI. Passengers can book flights and receive subsidies via voice commands, shifting from search to conversational service, streamlining ticket purchases. Future plans include expanding 'flight+' and lifestyle services.....

Ant Group Open-Sources the Full-Modal Large Model Ming-Flash-Omni 2.0: Comprehensive Enhancements in Multimodal Understanding, Image Editing, and Voice Generation

Ant Group open-sources the full-modal large model Ming-Flash-Omni 2.0, which demonstrates outstanding performance in multiple benchmark tests, including visual language understanding, voice generation, and image processing, with some metrics surpassing Gemini 2.5 Pro. The model introduces a groundbreaking audio unified generation capability across all scenarios, supporting the generation of speech, sound effects, and music within the same audio track. Users can adjust parameters such as voice tone and speaking speed through natural language instructions.

Integrating Image Generation and Editing into One! Qwen-Image-2.0 Launch: 2K Ultra-Quality Challenges Visual Limits

Alibaba Cloud launched the new image generation foundation model Qwen-Image-2.0 on February 10, 2026, which integrates image generation and editing capabilities. The model adopts a 7B lightweight architecture, maintaining fast inference while offering four core advantages: professional text rendering capability, supporting ultra-long and complex instructions up to 1k tokens, and demonstrating excellent performance in multiple blind testing benchmarks.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Alibaba Tongyi Qianwen Open Sources New Text-to-Image Model Qwen-Image

AIbase基地

This article is from AIbase Daily

AI News Recommendations

5 Billion Qwen Helped Me! Alibaba Qwen Spring Festival Event Over 130 Million People Participated in AI Life Services

Qwen3.5-Plus Open-Sourced on the Eve of Chinese New Year, Ranking as the World's Strongest Open-Source Large Model

Qwen3.5 Makes Its Debut on New Year's Eve, Alibaba Fully Innovates Its Artificial Intelligence Architecture

ByteDance Launches Seedream 5.0 Lite: A New Benchmark for Image Creation with Visual Reasoning and Real-Time Networking Capabilities

Qwen and Ant Afu Downloads Surge, Ranking First and Second on Apple App Store

Qwen App performs strongly: Daily active users exceed 73.52 million and has ranked first on the App Store free chart for six consecutive days

Buy a flight ticket with just one sentence! China Eastern Airlines partners with Alibaba Qwen to open the country's first consumer-level AI flying era

Ant Group Open-Sources the Full-Modal Large Model Ming-Flash-Omni 2.0: Comprehensive Enhancements in Multimodal Understanding, Image Editing, and Voice Generation

Integrating Image Generation and Editing into One! Qwen-Image-2.0 Launch: 2K Ultra-Quality Challenges Visual Limits

Qwen Official Reiterates: Discount Card Can Be Used to Purchase Year's Supplies, Valid for 19 More Days

AI News Recommendations

5 Billion Qwen Helped Me! Alibaba Qwen Spring Festival Event Over 130 Million People Participated in AI Life Services

Qwen3.5-Plus Open-Sourced on the Eve of Chinese New Year, Ranking as the World's Strongest Open-Source Large Model

Qwen3.5 Makes Its Debut on New Year's Eve, Alibaba Fully Innovates Its Artificial Intelligence Architecture

ByteDance Launches Seedream 5.0 Lite: A New Benchmark for Image Creation with Visual Reasoning and Real-Time Networking Capabilities

Qwen and Ant Afu Downloads Surge, Ranking First and Second on Apple App Store

Qwen App performs strongly: Daily active users exceed 73.52 million and has ranked first on the App Store free chart for six consecutive days

Buy a flight ticket with just one sentence! China Eastern Airlines partners with Alibaba Qwen to open the country's first consumer-level AI flying era

Ant Group Open-Sources the Full-Modal Large Model Ming-Flash-Omni 2.0: Comprehensive Enhancements in Multimodal Understanding, Image Editing, and Voice Generation

Integrating Image Generation and Editing into One! Qwen-Image-2.0 Launch: 2K Ultra-Quality Challenges Visual Limits

Qwen Official Reiterates: Discount Card Can Be Used to Purchase Year's Supplies, Valid for 19 More Days

GEO Services