Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

Sun Yat-sen University Collaborates with Meituan to Develop X-SAM Model, Which Can Segment Multiple Objects in a Single Operation, Leading in 20 Tests

AIbase基地

Published inAI News · 5 min read · Aug 19, 2025

The X-SAM image segmentation model, jointly developed by Sun Yat-sen University, Pengcheng Lab, and Meituan, was officially launched recently. This large multimodal model has made a significant breakthrough in the field of image segmentation, elevating the traditional capability of "segmenting anything" to "segmenting anything at all," significantly improving the adaptability and application scope of the model.

The traditional Segment Anything Model (SAM) has shown effectiveness in generating dense segmentation masks, but it suffers from an obvious limitation: its design allows only one type of visual reference input. To overcome this technical bottleneck, the research team innovatively proposed a Visual Grounded Segmentation (VGS) task framework, enabling precise instance segmentation of all objects through interactive visual references, thus providing the multimodal large language model with pixel-level understanding.

The technical design of X-SAM integrates several innovations. The model supports a uniform input format and output representation, capable of processing various types of visual and textual inputs. Its dual encoder architecture ensures a deep understanding of image content and segmentation features, while the segmentation connector provides multi-scale information fusion, significantly increasing the accuracy of segmentation.

The most remarkable feature is that X-SAM integrates the latest Mask2Former architecture as the segmentation decoder, allowing the model to simultaneously segment multiple target objects in a single operation, completely breaking the traditional technical limitations of SAM, which could only process a single object. This improvement not only increases processing efficiency but also opens up the possibility of batch segmentation tasks in complex scenarios.

For model training, the research team adopted a three-step progressive training strategy, ensuring stable performance improvements through a gradual learning process. After comprehensive testing on more than 20 major segmentation datasets, X-SAM achieved superior performance in segmentation dialogue generation tasks and text-image understanding tasks, thus validating the effectiveness of its technical solution.

The launch of X-SAM indicates a new direction for the development of image segmentation technology, and provides an important technical foundation for building a more intelligent general visual understanding system. The research team stated that the next step will be to deeply explore the application of this technology in the video domain, promoting the unified development of image and video segmentation, and further pushing the boundaries of machine visual understanding capabilities.

This scientific achievement holds significant academic importance, and its potential in practical applications such as autonomous driving, medical imaging, and industrial detection is very promising. With the release of the model and the promotion of the technology, it is expected to accelerate the overall development of the computer vision field.

Paper address: https://arxiv.org/pdf/2508.04655

Code address: https://github.com/wanghao9610/X-SAM

Demo address: https://47.115.200.157:7861

X-SAM Multimodal Large Model Visual Localization Segmentation SegmentAnythingModel

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

Beijing Tsinghua Changgeng Hospital has collaborated with Beijing Electronic Information and Intelligence to develop China's first pharmaceutical-specific large model, using AI to optimize pharmaceutical processes, improve the efficiency and accuracy of medication safety evaluation for special populations such as the elderly, children, and pregnant women, and address the challenges of rapid updates in drug information and complex individual differences.

Oct 17, 2025

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

Google Gemini 3.0 Pro begins limited rollout, enhancing reasoning and multimodal capabilities, with full release expected by month-end. DeepMind team is gradually updating users to boost AI performance.....

Oct 17, 2025

110

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

ByteDance launches Doubao 1.6, the first domestic model with adjustable thinking depth, balancing efficiency and quality, plus a lightweight version for enterprises.....

Oct 17, 2025

Baidu Releases Global Leading Document Parsing Model PaddleOCR-VL, Reshaping the OCR Technology Landscape!

Baidu's open-source PaddleOCR-VL model, with 0.9B parameters, leads globally with 92.6 points on OmniBenchDoc V1.5. It excels in text, handwriting, tables, formulas, and chart recognition.....

Oct 17, 2025

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

Microsoft launches OpenAI's Sora2 video generation model on Azure AI for public preview, offering cloud API access to businesses and developers. This multimodal tool processes text, image, and video inputs to create new content, advancing generative AI video into commercial applications like advertising.....

Oct 17, 2025

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

LLaVA-OneVision-1.5, a breakthrough multimodal model, evolved over two years from basic image-text alignment to handling images/videos. It offers an open, efficient training framework for building high-quality vision-language models via three-stage training.....

Oct 17, 2025

Google DeepMind and Yale University Collaborate to Develop AI Model C2S-Scale 27B for Cancer Treatment Pathways

Google DeepMind & Yale developed C2S-Scale27B, a 27B-parameter AI model based on Gemma, analyzing cell behavior & cancer-immune interactions. Validated in live cells, it offers new cancer treatment insights.....

Oct 17, 2025

ByteDance Releases Dou Bao Large Model 1.6: The First Domestic Model Supporting Adjustable Thinking Depth

ByteDance's Volcano Engine launches Doubao 1.6, China's first adjustable-length AI model. Features four thinking-depth options to balance output quality and response speed. Key innovation: 77.5% fewer tokens consumed in low-speed mode.....

Oct 17, 2025

100

Douyin's Duanbao Large Model: Daily Calls Exceed 30 Trillion Tokens, Rapid Growth is Remarkable!

Doubao model's usage surged from 120B tokens in May 2024 to over 30T tokens by Sept 2025, a 253x growth, reflecting rapid adoption across industries.....

Oct 16, 2025

140

Volcano Engine Launches Four Powerful Large Models, Voice Synthesis and Replication Features Upgraded

Volcano Engine launched four Doubao AI models at Wuhan AI Expo: upgraded 1.6 with four thinking lengths, lightweight 1.6lite, and new voice synthesis 2.0 & cloning 2.0, enhancing intelligence for flexible enterprise solutions.....

Oct 16, 2025

160

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Sun Yat-sen University Collaborates with Meituan to Develop X-SAM Model, Which Can Segment Multiple Objects in a Single Operation, Leading in 20 Tests

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

Baidu Releases Global Leading Document Parsing Model PaddleOCR-VL, Reshaping the OCR Technology Landscape!

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

Google DeepMind and Yale University Collaborate to Develop AI Model C2S-Scale 27B for Cancer Treatment Pathways

ByteDance Releases Dou Bao Large Model 1.6: The First Domestic Model Supporting Adjustable Thinking Depth

Douyin's Duanbao Large Model: Daily Calls Exceed 30 Trillion Tokens, Rapid Growth is Remarkable!

Volcano Engine Launches Four Powerful Large Models, Voice Synthesis and Replication Features Upgraded

GEO Services