Luma Labs released the image generation model Uni-1 on March 23, which is the company's first publicly available model based on the Unified Intelligence architecture. The official website has opened free trial access, and API pricing has been announced, with enterprise access channels gradually launching.

QQ_1774317385820.png

Architecture Change: From Diffusion Models to Autoregressive

Uni-1 abandoned the current mainstream diffusion model approach and instead uses a decoder-only autoregressive Transformer, arranging text tokens and image tokens in an alternating sequence into a single sequence, completing inference and pixel generation in a single forward pass.

Luma CEO Amit Jain explained that traditional solutions usually first use a language model for planning and then hand it over to a diffusion model for generation, leading to information loss between the two stages. The design goal of Uni-1 is to eliminate this gap.

Jain previously worked at Apple and participated in Vision Pro engineering work.

Function: Reference Image Control and Cross-Style Generation

Uni-1 supports generating images guided by one or more reference images, preserving identity, posture, and composition of the subject. Official tests show that in handling character consistency and portrait control, the multi-reference image mode performs stably.

The model claims support for 76 visual styles, covering categories such as realistic photography, comics, and ukiyo-e.

In a demonstration, inputting "Draw an infographic of the Golden Gate Bridge" led the model to automatically plan the layout, generate a bridge structure diagram, and annotate data such as "1711 Meters," with the internal reasoning process visible in real time.

Benchmarking: Leading in Spatial Reasoning and Reference Generation

QQ_1774317334856.png

Data published by Luma shows that Uni-1 scored 0.51 in the RISEBench reasoning benchmark, higher than Google Nano Banana 2's 0.50 and OpenAI GPT Image 1.5's 0.46; its spatial reasoning score was 0.58, and logical reasoning 0.32, about twice that of GPT Image.

ODinW-13 object detection score was 46.2 mAP, close to Google Gemini 3 Pro's 46.3.

In terms of human preference Elo ranking, Uni-1 ranked first in overall preference, style and editing, and reference generation, and second in text-to-image generation.

Pricing

API charges are based on tokens: $0.50 per million tokens for input text, $1.20 per million tokens for input images, $3.00 per million tokens for output text and thought chain, and $45.45 per million tokens for output images.

Converted to a single image: Text-to-image (2048px) costs approximately $0.0909, editing with a single reference image costs around $0.0933, and eight reference images cost about $0.1101.

VentureBeat reported that in enterprise scenarios with 2K resolution, Uni-1 costs 10% to 30% less than Google Nano Banana 2.

Background

Luma Labs previously focused on video generation products like Dream Machine (Ray3 series). On March 5, the company released the Luma Agents creative agent platform based on the Unified Intelligence architecture. Uni-1 is the first application of this architecture in a static image product.

Within hours of the release, related posts on the X platform received over 2.3 million views. Luma stated that subsequent video and audio versions will be launched, but the specific timing has not been disclosed.

Try address: lumalabs.ai/uni-1