Focal Sparse Convolutional Networks for 3D Object Detection

Abstract

Non-uniformed 3D sparse data, e.g., point clouds or voxels in differentspatial positions, make contribution to the task of 3D object detection indifferent ways. Existing basic components in sparse convolutional networks(Sparse CNNs) process all sparse data, regardless of regular or submanifoldsparse convolution. In this paper, we introduce two new modules to enhance thecapability of Sparse CNNs, both are based on making feature sparsity learnablewith position-wise importance prediction. They are focal sparse convolution(Focals Conv) and its multi-modal variant of focal sparse convolution withfusion, or Focals Conv-F for short. The new modules can readily substitutetheir plain counterparts in existing Sparse CNNs and be jointly trained in anend-to-end fashion. For the first time, we show that spatially learnablesparsity in sparse convolution is essential for sophisticated 3D objectdetection. Extensive experiments on the KITTI, nuScenes and Waymo benchmarksvalidate the effectiveness of our approach. Without bells and whistles, ourresults outperform all existing single-model entries on the nuScenes testbenchmark at the paper submission time. Code and models are athttps://github.com/dvlab-research/FocalsConv.

Quick Read (beta)

loading the full paper ...