PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers

Abstract

Local processing is an essential feature of CNNs and other neural networkarchitectures - it is one of the reasons why they work so well on images whererelevant information is, to a large extent, local. However, perspective effectsstemming from the projection in a conventional camera vary for different globalpositions in the image. We introduce Perspective Crop Layers (PCLs) - a form ofperspective crop of the region of interest based on the camera geometry - andshow that accounting for the perspective consistently improves the accuracy ofstate-of-the-art 3D pose reconstruction methods. PCLs are modular neuralnetwork layers, which, when inserted into existing CNN and MLP architectures,deterministically remove the location-dependent perspective effects whileleaving end-to-end training and the number of parameters of the underlyingneural network unchanged. We demonstrate that PCL leads to improved 3D humanpose reconstruction accuracy for CNN architectures that use croppingoperations, such as spatial transformer networks (STN), and, somewhatsurprisingly, MLPs used for 2D-to-3D keypoint lifting. Our conclusion is thatit is important to utilize camera calibration information when available, forclassical and deep-learning-based computer vision alike. PCL offers an easy wayto improve the accuracy of existing 3D reconstruction networks by making themgeometry aware. Our code is publicly available atgithub.com/yu-frank/PerspectiveCropLayers.

Quick Read (beta)

loading the full paper ...