On March 31, Ant Lingbo Technology officially open-sourced the large-scale RGB-D depth dataset LingBot-Depth-Dataset. This dataset contains 3 million high-quality sample pairs, of which 2 million pairs were collected from real scenes and 1 million pairs were rendered, with a total size of 2.71TB, covering 6 mainstream depth cameras. It is currently the largest RGB-D dataset in the open-source community based on real scenes. This open source will provide more abundant and realistic data support for embodied intelligence, spatial perception, and 3D vision.

0bc65a519522f645f87f903575c0d757.png

(Figure caption: Sample of the LingBot-Depth-Dataset. From top to bottom are RGB images, sensor raw depth maps, and ground truth depth maps. This dataset provides both raw depth and ground truth depth information, offering strong support for training and evaluation of related models in real scenes.)

For a long time, publicly available depth datasets have been plagued by limited scale, insufficient coverage of real scenes, and single hardware devices. Many datasets are mainly synthetic, and there are significant differences between them and real sensors in terms of noise, holes, and material representation, which greatly limits the application of related models in real environments.

The LingBot-Depth-Dataset effectively fills the data gap in the field of spatial perception, especially providing large-scale data captured from real scenes. Each sample in this dataset includes an RGB image, a sensor's raw depth map, and a ground truth depth map, which can be directly used for training and evaluation of depth estimation and depth completion tasks. The dataset covers six mainstream depth cameras: Orbbec335, 335L, and Intel RealSense D405, D415, D435, D455, helping to improve model training, adaptation, and evaluation across different devices and scenarios.

According to the introduction, the high-precision spatial perception model LingBot-Depth previously open-sourced by Ant Lingbo was trained using this dataset as its core training data. Compared with industry mainstream methods PromptDA and PriorDA, LingBot-Depth reduces the depth prediction error by more than 70% in indoor scenes and reduces the error by about 47% in sparse depth completion tasks. After installing this model, commercially available depth cameras can output more complete, smoother, and edge clearer depth maps in complex scenarios such as transparent glass, reflective surfaces, and backlit conditions, without requiring hardware upgrades. In some scenarios, their performance even exceeds that of top-tier industrial depth cameras in the industry.

For universities and research institutions, this open source not only helps reduce the barriers to data collection and annotation but also has the potential to accelerate the transition of related technologies from research validation to practical applications. As robots and embodied intelligence accelerate into real-world scenarios, large-scale, high-quality spatial perception datasets primarily based on real-world data will undoubtedly become an essential infrastructure for the continuous progress of the industry.