Abstract
We consider an object pose estimation and model fitting problem, where -given a partial point cloud of an object - the goal is to estimate the objectpose by fitting a CAD model to the sensor data. We solve this problem bycombining (i) a semantic keypoint-based pose estimation model, (ii) a novelself-supervised training approach, and (iii) a certification procedure, thatnot only verifies whether the output produced by the model is correct or not,but also flags uniqueness of the produced solution. The semantic keypointdetector model is initially trained in simulation and does not perform well onreal-data due to the domain gap. Our self-supervised training procedure uses acorrector and a certification module to improve the detector. The correctormodule corrects the detected keypoints to compensate for the domain gap, and isimplemented as a declarative layer, for which we develop a simpledifferentiation rule. The certification module declares whether the correctedoutput produced by the model is certifiable (i.e. correct) or not. At eachiteration, the approach optimizes over the loss induced only by the certifiableinput-output pairs. As training progresses, we see that the fraction of outputsthat are certifiable increases, eventually reaching near $100\%$ in many cases.We also introduce the notion of strong certifiability wherein the model candetermine if the predicted object model fit is unique or not. The detectedsemantic keypoints help us implement this in the forward pass. We conductextensive experiments to evaluate the performance of the corrector, thecertification, and the proposed self-supervised training using the ShapeNet andYCB datasets, and show the proposed approach achieves performance comparable tofully supervised baselines while not requiring pose or keypoint supervision onreal data.