In the fields of computer vision and graphics, the abstraction of 3D shapes is a foundational and critical research area. By breaking down complex 3D shapes into simple geometric units, researchers are able to better understand the mechanisms behind human visual perception.
However, existing 3D generation methods often fail to meet the requirements for semantic depth and interpretability in tasks such as robotic manipulation or scene understanding. Traditional shape abstraction methods frequently face issues with over-segmentation or lack of generalization capabilities.
PrimitiveAnything: A Revolutionary Framework
The research team from Tencent AIPD and Tsinghua University jointly launched the PrimitiveAnything framework, which aims to redefine shape abstraction as a task of generating primitive components. This framework uses a decoder-based transformer that can generate variable-length primitive component sequences based on shape features, significantly improving geometric accuracy and learning efficiency.
The core of PrimitiveAnything lies in its unified and unambiguous parameterization scheme, which supports multiple types of primitive shapes. This innovative design enables the framework to effectively capture how complex shapes are decomposed into simpler components, aligning more closely with human intuitive understanding.
Autoregressive Generation: Efficient Reconstruction
PrimitiveAnything generates 3D shapes through an autoregressive approach. The type, position, rotation, and scaling properties of each primitive component are encoded and input into the transformer to predict the next component. The framework uses a cascaded decoder to model dependencies between attributes, ensuring consistency during the generation process.
During training, PrimitiveAnything combines cross-entropy loss, Chamfer distance (for reconstruction accuracy), and Gumbel-Softmax (for differentiable sampling) until an end-of-sequence marker is generated. This process allows for flexible and human-like decomposition of complex 3D shapes.
Human Primitive Dataset: Comprehensive Evaluation
To validate the effectiveness of the framework, the research team constructed a large-scale HumanPrim dataset containing 120,000 samples with manually annotated primitive components. Through evaluations using metrics such as Chamfer distance, Earth Mover's distance, and Hausdorff distance, PrimitiveAnything demonstrated excellent performance in reconstruction accuracy and consistency with human abstraction patterns.
Moreover, the framework supports generating 3D content from text or image inputs, allowing users to easily edit the generated results, achieve high modeling quality, and realize over 95% storage savings, making it particularly suitable for efficient interactive 3D applications.
Conclusion: Efficient and Convenient 3D Generation
The PrimitiveAnything framework captures intuitive decomposition patterns by treating 3D shape abstraction as a sequence generation task and leveraging human-designed primitive components. The framework achieves high-quality generation across various object categories, demonstrating strong generalization capabilities.
With its efficiency and lightweight nature, PrimitiveAnything is highly suitable for user-generated content applications like gaming, where both performance and ease of use are crucial.
demo: https://huggingface.co/spaces/hyz317/PrimitiveAnything