Abstract
3D human pose and shape estimation (a.k.a. "human mesh recovery") hasachieved substantial progress. Researchers mainly focus on the development ofnovel algorithms, while less attention has been paid to other critical factorsinvolved. This could lead to less optimal baselines, hindering the fair andfaithful evaluations of newly designed methodologies. To address this problem,this work presents the first comprehensive benchmarking study from threeunder-explored perspectives beyond algorithms. 1) Datasets. An analysis on 31datasets reveals the distinct impacts of data samples: datasets featuringcritical attributes (i.e. diverse poses, shapes, camera characteristics,backbone features) are more effective. Strategical selection and combination ofhigh-quality datasets can yield a significant boost to the model performance.2) Backbones. Experiments with 10 backbones, ranging from CNNs to transformers,show the knowledge learnt from a proximity task is readily transferable tohuman mesh recovery. 3) Training strategies. Proper augmentation techniques andloss designs are crucial. With the above findings, we achieve a PA-MPJPE of47.3 mm on the 3DPW test set with a relatively simple model. More importantly,we provide strong baselines for fair comparisons of algorithms, andrecommendations for building effective training configurations in the future.Codebase is available at http://github.com/smplbody/hmr-benchmarks