Tongyi Qianwen Launches the Multimodal Unified Understanding and Generation Model Qwen VLo

AIbase基地

Published inAI News · 6 min read · Jun 28, 2025

211

Recently, the Qwen VLo multimodal large model was officially launched. The model has made significant progress in image content understanding and generation, providing users with a brand-new visual creation experience.

According to the introduction, Qwen VLo has been comprehensively upgraded based on the advantages of the original Qwen-VL series models. The model not only can accurately "understand" the world but also perform high-quality re-creation based on its understanding, truly achieving a leap from perception to generation. Users can now directly experience this new model on the Qwen Chat (chat.qwen.ai) platform.

The unique feature of Qwen VLo lies in its progressive generation method. When generating images, the model uses a step-by-step construction strategy from left to right and top to bottom, continuously optimizing and adjusting the predicted content during the process to ensure the final result is harmonious and consistent. This generation mechanism not only improves visual effects but also provides users with a more flexible and controllable creative process.

微信截图_20250628093705.png

In terms of content understanding and re-creation, Qwen VLo demonstrates strong capabilities. Compared to previous multimodal models, Qwen VLo can better maintain semantic consistency during the generation process, avoiding issues such as misgenerating cars into other objects or failing to retain key structural features of the original image. For example, when a user provides a photo of a car and requests a color change, Qwen VLo can accurately identify the car model, retain the original structural features, and naturally transform the color style, making the generated result both as expected and realistic.

Additionally, Qwen VLo supports open instruction editing and modification. Users can propose various creative instructions through natural language, such as changing the art style, adding elements, or adjusting the background. The model can flexibly respond to these instructions and generate results that meet user expectations. Whether it's art style transfer, scene reconstruction, or detail refinement, Qwen VLo can easily handle them.

Notably, Qwen VLo also has the ability to support multilingual instructions. The model supports instructions in multiple languages, including Chinese and English, providing a unified and convenient interaction experience for global users. No matter which language users use, they just need to briefly describe their needs, and the model can quickly understand and output the desired results.

In practical applications, Qwen VLo demonstrates diverse functions. It can directly generate and modify images, such as replacing backgrounds, adding subjects, or performing style transfers. At the same time, the model can also complete large-scale modifications based on open instructions, including visual perception tasks such as detection and segmentation. In addition, Qwen VLo supports input and generation of multiple images, as well as image detection and annotation functions.

Aside from cases where text and images are input simultaneously, Qwen VLo also supports direct image generation from text, including general images and Chinese and English posters. The model uses dynamic resolution training and supports image generation at any resolution and aspect ratio, allowing users to generate image content suitable for different scenarios according to actual needs.

Currently, Qwen VLo is still in the preview stage. Although it has shown powerful capabilities, there are still some shortcomings. For example, there may be situations during the generation process that do not conform to facts or are not completely consistent with the original image. The development team stated that they will continue to iterate the model, constantly improving its performance and stability.

Experience address: chat.qwen.ai

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

Recently, the Silicon-Based Flow Large Model Service Platform officially launched Ant Group's newly open-sourced Ling-mini-2.0. This new model demonstrates extremely high generation speed while maintaining advanced performance, marking a breakthrough in achieving great power with a small scale. Ling-mini-2.0 adopts a MoE architecture, with a total of 16B parameters, but only activates 1.4B parameters per Token during generation, significantly improving generation speed. This design not only enables the model to process

Anthropic Confirms Decline in Claude Model Quality and Has Fixed It

Recently, the artificial intelligence company Anthropic officially confirmed that the Claude series models experienced a decline in response quality in recent days. However, the issue has been successfully resolved by them. The company emphasized that the problem was not due to demand or cost factors, but rather an unexpected situation. According to the information, the problem occurred in connection with two technical failures. The first failure took place between August 5th and September 4th, mainly affecting a small portion of Claude Sonnet4 requests. Although the issue expanded after August 29th, it is fortunate that...

AI Daily: Tencent Open Sources Image Model HunyuanImage2.1; Aishiketech Secures $60 Million in Funding; Freepik Launches the Seedream4.0 Image Model

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Discover new AI products: https://app.aibase.com/zh1. Tencent upgrades the Hunyuan image generation model HunyuanImage 2.1, supporting writing and 2k resolution. Tencent Hunyuan released the latest image generation model 'HunyuanImage 2.1 (HunyuanImage'

8 Billion Parameters With Only 3 Billion! Qwen3 New Model's Inference Speed Increases by 10 Times

The Tongyi Qianwen team from Alibaba has just thrown a major surprise to global developers. The Qwen3-Next-80B-A3B-Instruct model they are about to release completely redefines the traditional large model operation logic. This seemingly contradictory number combination hides a remarkable technological breakthrough: a total of 8 billion parameters, but only 3 billion are actually activated, like a super sports car using only one-tenth of its engine yet running ten times faster. Just hours ago, Hugging Face Tr

AI Talent Service Platform Mercor Seeks $10 Billion Valuation in Series C Funding, Revenue Approaching $500 Million

According to marketing materials obtained by TechCrunch and two informed sources, the startup Mercor, which provides AI model training experts for tech companies such as OpenAI and Meta, is negotiating with investors for its Series C funding round. The company is currently seeking a valuation of $10 billion or higher, an increase from the previously discussed target of $8 billion a few months ago. According to two sources, the venture capital firm Felicis, which has previously invested in Mercor, is considering increasing its investment in the Series C funding round.

AIShi Technology Secures $60 Million in Series B Funding, Led by Alibaba

AIShi Technology, a pioneer in the field of AI video generation, has announced a Series B funding round exceeding $60 million, led by Alibaba. Other investors include Dachen Capital, Shenzhen Investment Group, and the Beijing Artificial Intelligence Industry Investment Fund. This financing sets a new record for the largest single investment in the domestic video generation industry, marking AIShi Technology's continued expansion and technological innovation in the market. AIShi Technology was established in April 2023 and is committed to developing world-leading AI video generation large models and their applications. Since its establishment, the company has rapidly grown its user base beyond 1

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

Tongyi Qianwen Launches the Multimodal Unified Understanding and Generation Model Qwen VLo

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

Anthropic Confirms Decline in Claude Model Quality and Has Fixed It

AI Daily: Tencent Open Sources Image Model HunyuanImage2.1; Aishiketech Secures $60 Million in Funding; Freepik Launches the Seedream4.0 Image Model

8 Billion Parameters With Only 3 Billion! Qwen3 New Model's Inference Speed Increases by 10 Times

Freepik Integrates with Seedream 4.0! Premium+ Members Can Generate Unlimited Images

Anthropic launches Claude document generation feature, supporting the creation of multiple file formats such as Excel and PPT

AI Talent Service Platform Mercor Seeks $10 Billion Valuation in Series C Funding, Revenue Approaching $500 Million

Freepik launches Doubaobao Seedream 4.0 image model, premium+ members can use it infinitely

Tencent Upgrades Huan Yuan Image Model 2.1, Supports Writing and 2K Resolution

AIShi Technology Secures $60 Million in Series B Funding, Led by Alibaba

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

Tongyi Qianwen Launches the Multimodal Unified Understanding and Generation Model Qwen VLo

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Silicon-Based Flow Launches Ling-mini-2.0 with Ant Group, Achieving Both Speed and Performance

Anthropic Confirms Decline in Claude Model Quality and Has Fixed It

AI Daily: Tencent Open Sources Image Model HunyuanImage2.1; Aishiketech Secures $60 Million in Funding; Freepik Launches the Seedream4.0 Image Model

8 Billion Parameters With Only 3 Billion! Qwen3 New Model's Inference Speed Increases by 10 Times

Freepik Integrates with Seedream 4.0! Premium+ Members Can Generate Unlimited Images

Anthropic launches Claude document generation feature, supporting the creation of multiple file formats such as Excel and PPT

AI Talent Service Platform Mercor Seeks $10 Billion Valuation in Series C Funding, Revenue Approaching $500 Million

Freepik launches Doubaobao Seedream 4.0 image model, premium+ members can use it infinitely

Tencent Upgrades Huan Yuan Image Model 2.1, Supports Writing and 2K Resolution

AIShi Technology Secures $60 Million in Series B Funding, Led by Alibaba

GEO Services