Step星辰 has officially released and open-sourced its 3D large model, Step1X-3D. The launch of this model marks the latest achievement of Step星辰 in the multi-modal direction, further expanding the application boundaries of AI technology after image, video, voice, music, and other modalities.
The total parameter count of the Step1X-3D model reaches 4.8 billion, with 1.3 billion parameters for the geometry module and 3.5 billion for the texture module. With a solid data foundation and advanced 3D-native architecture, the model can generate high-fidelity and controllable 3D content. Step1X-3D not only pursues visual "beauty," but also focuses on "usability" and "controllability," aiming to provide a powerful and reliable technical engine for 3D content creation.
The core feature of Step1X-3D lies in addressing key challenges in 3D content generation. It has made innovative practices in data, generation quality, and controllability. First, data-driven and algorithmic collaborative optimization form the foundation of this model. Step1X-3D rigorously screened and processed over 5 million raw data samples, building a training sample library containing 2 million high-quality and standardized samples, effectively overcoming bottlenecks in industry data scarcity and inconsistent quality. By using enhanced mesh-SDF conversion technology and other methods, it ensures the precision of the model's learning at the source and the efficiency of the final generation, increasing the success rate of watertight geometry conversion by 20% and giving Step1X-3D strong generalization capabilities and detail-capturing abilities.
Secondly, Step1X-3D adopts an advanced 3D-native two-stage architecture that decouples geometry and texture representation, ensuring that what is generated is not just visually appealing but also structurally reliable and suitable for downstream applications. This effectively avoids geometric distortion, ensuring accuracy, realism, and consistency in generation. The core of geometry generation lies in using an innovative hybrid VAE-DiT architecture optimized for 3D characteristics, responsible for generating TSDF internal representations, ensuring complete 3D model structures without broken faces or holes. At the same time, by introducing sharp edge sampling and other technologies, it precisely captures and reproduces rich geometric details of objects. Texture generation is based on a powerful SD-XL model with deep customization and optimization, achieving efficient collaboration with the geometry module through precise geometric guidance and multi-view synchronization in latent space, ensuring that the generated textures are not only color-rich and realistic in texture but also maintain high consistency across multiple views, precisely fitting complex three-dimensional surfaces, and effectively avoiding common distortion and seam defects.
Finally, Step1X-3D significantly improves the controllability and usability of 3D content generation. The VAE-Diffusion overall architecture is highly consistent with mainstream 2D generation models (such as Stable Diffusion), allowing seamless integration and application of mature 2D control techniques like lightweight LoRA fine-tuning. Therefore, users can intuitively and precisely adjust various attributes of generated 3D assets, such as symmetry and surface details (like sharpness and smoothness), making creation more accurately aligned with user intent.
To objectively evaluate the actual performance of Step1X-3D, Step星辰 conducted a rigorous quantitative and qualitative assessment using a self-built comprehensive test (including 110 diverse test cases), comparing it comprehensively with several mainstream models. The results showed that in automatic evaluations, Step1X-3D performed excellently in many key dimensions. Especially in the core metric CLIP-Score, which measures content and input semantic consistency, Step1X-3D achieved the highest score among all comparison models, providing the open-source community with a highly competitive 3D generation solution.
GitHub:
https://github.com/stepfun-ai/Step1X-3D
HuggingFace:
https://huggingface.co/stepfun-ai/Step1X-3D
ModelScope:
https://www.modelscope.cn/models/stepfun-ai/Step1X-3D