Self-Supervised Pretraining of 3D Features on any Point-Cloud

Abstract

Pretraining on large labeled datasets is a prerequisite to achieve goodperformance in many computer vision tasks like 2D object recognition, videoclassification etc. However, pretraining is not widely used for 3D recognitiontasks where state-of-the-art methods train models from scratch. A primaryreason is the lack of large annotated datasets because 3D data is bothdifficult to acquire and time consuming to label. We present a simpleself-supervised pertaining method that can work with any 3D data - single ormultiview, indoor or outdoor, acquired by varied sensors, without 3Dregistration. We pretrain standard point cloud and voxel based modelarchitectures, and show that joint pretraining further improves performance. Weevaluate our models on 9 benchmarks for object detection, semanticsegmentation, and object classification, where they achieve state-of-the-artresults and can outperform supervised pretraining. We set a newstate-of-the-art for object detection on ScanNet (69.0% mAP) and SUNRGBD (63.5%mAP). Our pretrained models are label efficient and improve performance forclasses with few examples.

Quick Read (beta)

loading the full paper ...