Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction

Abstract

We present MonoPSR, a monocular 3D object detection method that leveragesproposals and shape reconstruction. First, using the fundamental relations of apinhole camera model, detections from a mature 2D object detector are used togenerate a 3D proposal per object in a scene. The 3D location of theseproposals prove to be quite accurate, which greatly reduces the difficulty ofregressing the final 3D bounding box detection. Simultaneously, a point cloudis predicted in an object centered coordinate system to learn local scale andshape information. However, the key challenge is how to exploit shapeinformation to guide 3D localization. As such, we devise aggregate losses,including a novel projection alignment loss, to jointly optimize these tasks inthe neural network to improve 3D localization accuracy. We validate our methodon the KITTI benchmark where we set new state-of-the-art results amongpublished monocular methods, including the harder pedestrian and cyclistclasses, while maintaining efficient run-time.

Quick Read (beta)

loading the full paper ...