Abstract
In this paper, we introduce a method for reconstructing 3D humans from asingle image using a biomechanically accurate skeleton model. To achieve this,we train a transformer that takes an image as input and estimates theparameters of the model. Due to the lack of training data for this task, webuild a pipeline to produce pseudo ground truth model parameters for singleimages and implement a training procedure that iteratively refines these pseudolabels. Compared to state-of-the-art methods for 3D human mesh recovery, ourmodel achieves competitive performance on standard benchmarks, while itsignificantly outperforms them in settings with extreme 3D poses andviewpoints. Additionally, we show that previous reconstruction methodsfrequently violate joint angle limits, leading to unnatural rotations. Incontrast, our approach leverages the biomechanically plausible degrees offreedom making more realistic joint rotation estimates. We validate ourapproach across multiple human pose estimation benchmarks. We make the code,models and data available at: https://isshikihugh.github.io/HSMR/