Apple's machine learning team has collaborated with Nanjing University and the Hong Kong University of Science and Technology to launch an advanced AI model named Matrix3D. The primary function of this model is to reconstruct real objects and scenes from a small number of 2D photos, providing users with high-quality 3D outputs.

image.png

All users need to do is provide three photos, and Matrix3D will automatically generate detailed 3D reconstruction effects. This process not only simplifies the steps for 3D modeling but also brings new opportunities to various application fields while further advancing the development of AI technology.

In traditional 3D modeling, photogrammetry techniques are often used, which require multiple photos for measurement and modeling. However, current processes usually rely on multiple independent models, such as pose estimation and depth prediction, making this decentralized method prone to inefficiency and errors. Matrix3D changes this traditional approach by integrating all elements—images, camera parameters (such as shooting angles and focal length), and depth data—into a unified architecture that processes this information, reducing intermediate steps and making the reconstruction process smoother and more reliable. Researchers noted that this integrated design significantly reduces the risk of human error and improves overall performance.

In terms of training methods, Matrix3D uses a masking learning strategy inspired by early Transformer-based AI systems. This technique hides part of the input data randomly, prompting the model to learn how to "fill in the blanks," enhancing its adaptability. Even when datasets are small or incomplete, Matrix3D can effectively learn key features.

Test results show that Matrix3D performs exceptionally well. Users only need to input three photos, and the model can generate fine 3D reconstruction effects covering both objects and entire environments. This provides substantial potential for immersive technology applications. For example, in head-mounted devices like Apple Vision Pro, Matrix3D can create highly realistic virtual scenes, thus enhancing user experience. Researchers believe that this capability will further promote the development of the metaverse and augmented reality.

Official introduction: https://machinelearning.apple.com/research/large-photogrammetry-model

Key points:

🌟 Matrix3D is an AI model launched by Apple in collaboration with Nanjing University and the Hong Kong University of Science and Technology, capable of generating 3D scenes from a few 2D photos.  

📸 Users only need to provide three photos to obtain high-quality 3D reconstructions, simplifying the operation process.  

🚀 Matrix3D integrates multiple processing steps, improving efficiency and reducing human errors, thereby further advancing the development of AI technology.