Abstract
Regression-based methods have recently shown promising results inreconstructing human meshes from monocular images. By directly mapping from rawpixels to model parameters, these methods can produce parametric models in afeed-forward manner via neural networks. However, minor deviation in parametersmay lead to noticeable misalignment between the estimated meshes and imageevidences. To address this issue, we propose a Pyramidal Mesh AlignmentFeedback (PyMAF) loop to leverage a feature pyramid and rectify the predictedparameters explicitly based on the mesh-image alignment status in our deepregressor. In PyMAF, given the currently predicted parameters, mesh-alignedevidences will be extracted from finer-resolution features accordingly and fedback for parameter rectification. To reduce noise and enhance the reliabilityof these evidences, an auxiliary pixel-wise supervision is imposed on thefeature encoder, which provides mesh-image correspondence guidance for ournetwork to preserve the most related information in spatial features. Theefficacy of our approach is validated on several benchmarks, includingHuman3.6M, 3DPW, LSP, and COCO, where experimental results show that ourapproach consistently improves the mesh-image alignment of the reconstruction.Our code is publicly available at https://hongwenzhang.github.io/pymaf .