G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

Abstract

In this paper, we propose a novel real-time 6D object pose estimationframework, named G2L-Net. Our network operates on point clouds from RGB-Ddetection in a divide-and-conquer fashion. Specifically, our network consistsof three steps. First, we extract the coarse object point cloud from the RGB-Dimage by 2D detection. Second, we feed the coarse object point cloud to atranslation localization network to perform 3D segmentation and objecttranslation prediction. Third, via the predicted segmentation and translation,we transfer the fine object point cloud into a local canonical coordinate, inwhich we train a rotation localization network to estimate initial objectrotation. In the third step, we define point-wise embedding vector features tocapture viewpoint-aware information. In order to calculate more accuraterotation, we adopt a rotation residual estimator to estimate the residualbetween initial rotation and ground truth, which can boost initial poseestimation performance. Our proposed G2L-Net is real-time despite the factmultiple steps are stacked. Extensive experiments on two benchmark datasetsshow that the proposed method achieves state-of-the-art performance in terms ofboth accuracy and speed.

Quick Read (beta)

loading the full paper ...