Abstract
Recently, 3D input data based hand pose estimation methods have shownstate-of-the-art performance, because 3D data capture more spatial informationthan the depth image. Whereas 3D voxel-based methods need a large amount ofmemory, PointNet based methods need tedious preprocessing steps such asK-nearest neighbour search for each point. In this paper, we present a noveldeep learning hand pose estimation method for an unordered point cloud. Ourmethod takes 1024 3D points as input and does not require additionalinformation. We use Permutation Equivariant Layer (PEL) as the basic element,where a residual network version of PEL is proposed for the hand poseestimation task. Furthermore, we propose a voting based scheme to mergeinformation from individual points to the final pose output. In addition to thepose estimation task, the voting-based scheme can also provide point cloudsegmentation result without ground-truth for segmentation. We evaluate ourmethod on both NYU dataset and the Hands2017Challenge dataset. Our methodoutperforms recent state-of-the-art methods, where our pose accuracy iscurrently the best for the Hands2017Challenge dataset.