Single-Network Whole-Body Pose Estimation

Abstract

We present the first single-network approach for 2D~whole-body poseestimation, which entails simultaneous localization of body, face, hands, andfeet keypoints. Due to the bottom-up formulation, our method maintains constantreal-time performance regardless of the number of people in the image. Thenetwork is trained in a single stage using multi-task learning, through animproved architecture which can handle scale differences between body/foot andface/hand keypoints. Our approach considerably improves uponOpenPose~\cite{cao2018openpose}, the only work so far capable of whole-bodypose estimation, both in terms of speed and global accuracy. Unlike OpenPose,our method does not need to run an additional network for each hand and facecandidate, making it substantially faster for multi-person scenarios. This workdirectly results in a reduction of computational complexity for applicationsthat require 2D whole-body information (e.g., VR/AR, re-targeting). Inaddition, it yields higher accuracy, especially for occluded, blurry, and lowresolution faces and hands. For code, trained models, and validationbenchmarks, visit our project page:https://github.com/CMU-Perceptual-Computing-Lab/openpose_train.

Quick Read (beta)

loading the full paper ...