PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation

Abstract

Recently, several deep learning models have been proposed for 3D human poseestimation. Nevertheless, most of these approaches only focus on thesingle-person case or estimate 3D pose of a few people at high resolution.Furthermore, many applications such as autonomous driving or crowd analysisrequire pose estimation of a large number of people possibly at low-resolution.In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-basedNetwork), a new single-shot, anchor-based and multi-person 3D pose estimationapproach. The proposed model performs bounding box detection and, for eachdetected person, 2D and 3D pose regression into a single forward pass. It doesnot need any post-processing to regroup joints since the network predicts afull 3D pose for each bounding box and allows the pose estimation of a possiblylarge number of people at low resolution. To manage people overlapping, weintroduce a Pose-Aware Anchor Selection strategy. Moreover, as imbalance existsbetween different people sizes in the image, and joints coordinates havedifferent uncertainties depending on these sizes, we propose a method toautomatically optimize weights associated to different people scales and jointsfor efficient training. PandaNet surpasses previous single-shot methods onseveral challenging datasets: a multi-person urban virtual but very realisticdataset (JTA Dataset), and two real world 3D multi-person datasets (CMUPanoptic and MuPoTS-3D).

Quick Read (beta)

loading the full paper ...