New Model CoMPaSS-FLUX.1: Enhancing Spatial Understanding in Flux Text-to-Image Generation

AIbase基地

Published inAI News · 4 min read · Sep 2, 2025

Recently, a new achievement from the research team has attracted widespread attention —— CoMPaSS-FLUX.1 model. This is a LoRA adapter based on the FLUX.1 text-to-image diffusion model, designed to significantly enhance the understanding of object spatial relationships when generating images. The model has made significant progress in handling specific spatial relationships between objects, bringing new possibilities to the field of image generation.

The base model of CoMPaSS-FLUX.1 is FLUX.1-dev, with a LoRA rank of 16 and a file size of approximately 50MB, using the Diffusers framework. Its main purpose is to generate images with accurate spatial relationships, capable of creating compositions that require specific spatial arrangements, while enhancing spatial understanding while maintaining other capabilities.

In terms of performance, the key improvements of CoMPaSS-FLUX.1 are remarkable. According to the VISOR benchmark, the relative improvement reached 98%; in the T2I-CompBench spatial test, the improvement was 67%; and in the GenEval location evaluation, it reached a 131% relative improvement. In addition, CoMPaSS-FLUX.1 also performed well in image fidelity, with FID and CMMD scores lower than the base model, indicating an improvement in generation quality.

When using this model, users can refer to its effective prompts. The model performs best when describing spatial relationships, especially when the prompt includes clear descriptions of spatial relationships (such as "left," "right," "above," "below"), or clear spatial relationships involving two different objects (for example, "In the photo, A is to the right of B").

During the training process, CoMPaSS-FLUX.1 used data from the SCOP (Spatial Constraint-Oriented Pairing) data engine, covering about 28,000 carefully selected object pairs. These data have strict standards in terms of visual importance, semantic distinction, spatial clarity, object relationships, and visual balance.

The training process lasted for 24,000 steps, with a batch size configuration of 4, a learning rate set to 1e-4, and the use of the AdamW optimizer with a weight decay set to 1e-2.

huggingface:https://huggingface.co/blurgy/CoMPaSS-FLUX.1

Key Points:
🌟 The CoMPaSS-FLUX.1 model significantly improves spatial understanding during text-to-image generation, especially in handling relationships between objects.
📊 Performance evaluations show that the model has obvious improvements in multiple benchmark tests, maintaining high-quality generation results.
📚 The model training used a strictly filtered dataset, ensuring that the generated images have good spatial relationships and clarity visually.

Qwen's Spring Festival Big Discount Day One Was a Hit: 1 Million Orders Placed in 3 Hours, Server Struggled Temporarily

Alibaba Qwen APP launched the "Spring Festival 3 Billion Discount" campaign. The first round of "Free Tea Discount" was launched, and orders exceeded 1 million within 3 hours. The simple and low-barrier rules attracted a large number of users to participate. On social media, the topic "First AI Tea" went viral. The popularity of the event also put pressure on the system.

Anthropic Releases Claude Opus 4.6: Focused on Programming and Office Work, Autonomy Reaches a New Level

On February 5, 2026, Anthropic released Claude Opus 4.6, just two months after the previous version, demonstrating rapid iteration. The core advancements focus on 'autonomy' and 'task persistence'. Key breakthroughs include the first introduction of a 1 million token context window at the Opus level, as well as enhanced autonomous consciousness, marking the transition of the model from a dialogue tool to an intelligent agent.

Anthropic Releases Claude Opus 4.6: First with a 1 Million Token Context Window, Focused on Automation and Programming

Anthropic launches its new flagship AI model, Claude Opus 4.6, with a rapid update cycle. The new version focuses on 'autonomy' and productivity, aiming to provide deep intelligent support for developers and enterprise offices. Technical highlights include the first introduction of a 1 million token ultra-large context window, significantly enhancing the model's ability to handle long texts.

Trillion-Parameter Peak: Shanghai AI Lab Opens Source the World's Largest Scientific Multimodal Model Intern-S1-Pro

Shanghai Artificial Intelligence Laboratory has released and open-sourced the trillion-parameter scientific multimodal large model ShuRen Intern-S1-Pro, based on the "Integration of General and Specialized" architecture SAGE. It sets a new record for parameter scale in the open-source community and achieves breakthroughs in multiple scientific capabilities, maintaining a leading position in international academic evaluations in the AI4S field.

14 Days to Break 1 Million Downloads! Zhipu GLM-4.7-Flash Leads the Open Source Large Model SOTA

Two weeks after the release of Zhipu AI's open-source model GLM-4.7-Flash, its download count on Hugging Face exceeded 1 million. This 30B-A3B hybrid thinking model delivers strong performance, outperforming gpt-oss-20b and Qwen3-30B-A3B-Thinking-2507 in tests such as SWE-bench Verified and τ²-Bench, leading among models of the same size.

1 Billion Red Envelope Crash WeChat Rules, Tencent Yuan Coupon Red Envelopes Launched Urgently

Tencent Yuanbao's WeChat red envelope links were blocked, prompting a swift launch of 'password red envelopes' as an alternative. For the Spring Festival market, Tencent Yuanbao initiated a '1 billion CNY Spring Festival red envelope' campaign, endorsed by Ma Huateng, where users can increase their lottery chances by inviting others via shared links.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

New Model CoMPaSS-FLUX.1: Enhancing Spatial Understanding in Flux Text-to-Image Generation

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Qwen's Spring Festival Big Discount Day One Was a Hit: 1 Million Orders Placed in 3 Hours, Server Struggled Temporarily

Anthropic Releases Claude Opus 4.6: Focused on Programming and Office Work, Autonomy Reaches a New Level

Anthropic Releases Claude Opus 4.6: First with a 1 Million Token Context Window, Focused on Automation and Programming

Valuation Surges Nearly Twice in 4 Months! AI Chip Star Cerebras Secures $1 Billion Series H Funding

Trillion-Parameter Peak: Shanghai AI Lab Opens Source the World's Largest Scientific Multimodal Model Intern-S1-Pro

Shanghai AI Lab Releases the Scientific Multimodal Model Intern-S1-Pro

Avoiding Discussion of the $1 Billion Apple Deal: Alphabet's Earnings Call Takes a Cold Approach to AI Collaboration Details, Behind the Scenes Lies Monetization Anxiety

14 Days to Break 1 Million Downloads! Zhipu GLM-4.7-Flash Leads the Open Source Large Model SOTA

1 Billion Red Envelope Crash WeChat Rules, Tencent Yuan Coupon Red Envelopes Launched Urgently

Beware! Popular AI Agent OpenClaw Exposed with Critical Vulnerability, macOS Users Face Risk of Virus Injection

AI News Recommendations

Qwen's Spring Festival Big Discount Day One Was a Hit: 1 Million Orders Placed in 3 Hours, Server Struggled Temporarily

Anthropic Releases Claude Opus 4.6: Focused on Programming and Office Work, Autonomy Reaches a New Level

Anthropic Releases Claude Opus 4.6: First with a 1 Million Token Context Window, Focused on Automation and Programming

Valuation Surges Nearly Twice in 4 Months! AI Chip Star Cerebras Secures $1 Billion Series H Funding

Trillion-Parameter Peak: Shanghai AI Lab Opens Source the World's Largest Scientific Multimodal Model Intern-S1-Pro

Shanghai AI Lab Releases the Scientific Multimodal Model Intern-S1-Pro

Avoiding Discussion of the $1 Billion Apple Deal: Alphabet's Earnings Call Takes a Cold Approach to AI Collaboration Details, Behind the Scenes Lies Monetization Anxiety

14 Days to Break 1 Million Downloads! Zhipu GLM-4.7-Flash Leads the Open Source Large Model SOTA

1 Billion Red Envelope Crash WeChat Rules, Tencent Yuan Coupon Red Envelopes Launched Urgently

Beware! Popular AI Agent OpenClaw Exposed with Critical Vulnerability, macOS Users Face Risk of Virus Injection

GEO Services