Abstract
In this work, we propose an end-to-end framework to learn local multi-viewdescriptors for 3D point clouds. To adopt a similar multi-view representation,existing studies use hand-crafted viewpoints for rendering in a preprocessingstage, which is detached from the subsequent descriptor learning stage. In ourframework, we integrate the multi-view rendering into neural networks by usinga differentiable renderer, which allows the viewpoints to be optimizableparameters for capturing more informative local context of interest points. Toobtain discriminative descriptors, we also design a soft-view pooling module toattentively fuse convolutional features across views. Extensive experiments onexisting 3D registration benchmarks show that our method outperforms existinglocal descriptors both quantitatively and qualitatively.