Abstract
Estimating the 6D pose of objects from images is an important problem invarious applications such as robot manipulation and virtual reality. Whiledirect regression of images to object poses has limited accuracy, matchingrendered images of an object against the observed image can produce accurateresults. In this work, we propose a novel deep neural network for 6D posematching named DeepIM. Given an initial pose estimation, our network is able toiteratively refine the pose by matching the rendered image against the observedimage. The network is trained to predict a relative pose transformation usingan untangled representation of 3D location and 3D orientation and an iterativetraining process. Experiments on two commonly used benchmarks for 6D poseestimation demonstrate that DeepIM achieves large improvements overstate-of-the-art methods. We furthermore show that DeepIM is able to matchpreviously unseen objects.