Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

Apple Launches New Image Model Manzano with Dual Capabilities of Understanding and Generation

AIbase基地

Published inAI News · 6 min read · Sep 28, 2025

Apple recently introduced a new image model called Manzano in its research, which is designed to handle both image understanding and generation simultaneously. This dual capability is a technical challenge faced by many open-source models, and Apple states that this makes it more comparable to commercial systems such as those provided by OpenAI and Google in terms of efficiency and performance in image processing.

Currently, Manzano has not been released to the public or demonstrated publicly. However, Apple's research team shared a research paper along with some low-resolution image samples that demonstrate the model's ability to handle complex prompts. These samples were compared with outputs from the open-source model Deepseek Janus Pro and the commercial systems GPT-4o and Gemini 2.5 Flash Image Generation (also known as "Nano Banana"). In tests with three challenging prompts, Manzano performed comparably to OpenAI's GPT-4o and Google's Nano Banana.

Apple points out that the current core limitation of most open-source models lies in the fact that they often have to choose between strong image analysis and generation capabilities, while commercial systems can handle both. Especially when dealing with tasks that involve a lot of text, such as reading documents or interpreting charts, existing models perform particularly poorly.

Manzano's design uses a hybrid image tokenizer, a core concept that allows it to output two types of tokens: continuous tokens and discrete tokens. Continuous tokens represent images using floating-point numbers for understanding, while discrete tokens divide the image into fixed categories for generation. Since both tokens come from the same encoder, this reduces conflicts that may occur in traditional models.

During the training phase, Manzano integrates continuous and discrete adapters to adjust the language model's decoder. During inference, it provides two data streams required for understanding and generating images. The architecture of Manzano mainly consists of three parts: the hybrid tokenizer, a unified language model, and an independent image decoder for the final output. Apple built three different image decoders with varying parameter counts: 90 million, 175 million, and 352 million parameters, supporting resolutions ranging from 256 to 2048 pixels.

Apple's test results show that Manzano performs well on multiple benchmarks, especially in text-heavy tasks such as chart and document analysis, with the 3 billion parameter version scoring particularly well. The study also found that as the number of model parameters increased from 300 million to 3 billion, performance continued to improve.

Manzano is not only capable of handling classic image editing tasks but can also perform new tasks such as prompt-based editing, style transfer, image inpainting, expansion, and depth estimation. Apple believes that Manzano is a viable alternative to existing models, and its modular design may have a profound impact on future multimodal AI.

Paper: https://arxiv.org/abs/2509.16197

Key Points:
🌟 Manzano is a new image model that can perform both image understanding and generation simultaneously.
🔍 Apple's research shows that Manzano performs well in handling complex text tasks, approaching the level of commercial systems.
⚙️ The model uses a hybrid image tokenizer, reducing conflicts between image understanding and generation.

Manzano Imagemodel AppleInc.AIneologism

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meta Releases New Model CWM to Aid Code Understanding and Generation

Meta's CWM trains AI on code-environmentfulness interaction data, enabling deeper understanding of code execution for advanced code generation.....

Oct 1, 2025

120

Opera Launches AI-Powered Neon Browser to Enhance Productivity and Smart Task Management

Opera launches AI-powered Neon browser, featuring AI prompts for app creation and 'Cards' for reusable prompts, enhancing web browsing. Previously in closed beta, it now collaborates with companies to advance browser intelligence.....

Oct 1, 2025

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

DeepSeek releases V3.2-exp, featuring sparse attention to cut long-context costs by 50% and reduce API expenses for developers.....

Sep 30, 2025

150

DeepMind Introduces the Concept of FrameChain: Video Models May Achieve Comprehensive Visual Understanding

DeepMind introduces 'frame chain' for video AI, enabling spatiotemporal reasoning in generative models, a significant breakthrough.....

Sep 30, 2025

Robot Vision Makes a Big Leap! New Model Helps AI Understand the 3D World, Success Rate Increased by 31%

Shanghai Jiao Tong and Cambridge teams developed Evo model, enhancing robot 3D spatial understanding by injecting geometric priors, overcoming 2D data limitations in vision-language models.....

Sep 30, 2025

OpenAI splits into two: launches TikTok-like app Sora2 and integrates instant shopping features into ChatGPT

OpenAI launches Sora2, a TikTok-like AI video app, and integrates ChatGPT with Etsy/Shopify for instant shopping, expanding its consumer market presence.....

Sep 30, 2025

150

Apple Internally Testing New Chatbot Veritas, Future Siri Upgrade May Speed Up

Apple tests 'Veritas' chatbot internally to enhance Siri, competing with AI rivals like ChatGPT and Google Gemini, improving user experience.....

Sep 29, 2025

AI Daily: Tencent Unveils Huan Yuan Image 3.0; Kuaishou Launches KAT Series Agentic Coding Large Model; Apple Quietly Developing a ChatGPT-like Application

Kuaishou released KAT series Agentic Coding models, including KAT-Dev-32B and KAT-Coder, excelling in code intelligence. Developed by Kwaipilot team, these models enhance AI programming capabilities.....

Sep 28, 2025

260

Previous Stability AI CEO: The AI Revolution Will Make Human Intelligence Value Zero, Great Changes Loom in the Next 1000 Days

Ex-Stability AI CEO Emad Mostaque predicts AI will replace jobs and disrupt economies within 1000 days, as outlined in his book 'The Last Economy', suggesting AI could render human labor valueless or negative.....

Sep 28, 2025

110

Tencent Releases and Open Sources the New Generation Image Generation Model HunyuanImage 3.0

Tencent open-sourced its 80B-parameter multimodal image generation model 'Hunyuan Image 3.0', the first industrial-grade native multimodal model. It rivals leading closed-source models in quality, excelling in complex semantics, text-to-image generation (handling 1k-character prompts), and knowledge reasoning.....

Sep 28, 2025

100

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Apple Launches New Image Model Manzano with Dual Capabilities of Understanding and Generation

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Meta Releases New Model CWM to Aid Code Understanding and Generation

Opera Launches AI-Powered Neon Browser to Enhance Productivity and Smart Task Management

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

DeepMind Introduces the Concept of FrameChain: Video Models May Achieve Comprehensive Visual Understanding

Robot Vision Makes a Big Leap! New Model Helps AI Understand the 3D World, Success Rate Increased by 31%

OpenAI splits into two: launches TikTok-like app Sora2 and integrates instant shopping features into ChatGPT

Apple Internally Testing New Chatbot Veritas, Future Siri Upgrade May Speed Up

AI Daily: Tencent Unveils Huan Yuan Image 3.0; Kuaishou Launches KAT Series Agentic Coding Large Model; Apple Quietly Developing a ChatGPT-like Application

Previous Stability AI CEO: The AI Revolution Will Make Human Intelligence Value Zero, Great Changes Loom in the Next 1000 Days

Tencent Releases and Open Sources the New Generation Image Generation Model HunyuanImage 3.0

GEO Services