Abstract
We present a new trainable system for physically plausible markerless 3Dhuman motion capture, which achieves state-of-the-art results in a broad rangeof challenging scenarios. Unlike most neural methods for human motion capture,our approach, which we dub physionical, is aware of physical and environmentalconstraints. It combines in a fully differentiable way several key innovations,i.e., 1. a proportional-derivative controller, with gains predicted by a neuralnetwork, that reduces delays even in the presence of fast motions, 2. anexplicit rigid body dynamics model and 3. a novel optimisation layer thatprevents physically implausible foot-floor penetration as a hard constraint.The inputs to our system are 2D joint keypoints, which are canonicalised in anovel way so as to reduce the dependency on intrinsic camera parameters -- bothat train and test time. This enables more accurate global translationestimation without generalisability loss. Our model can be finetuned only with2D annotations when the 3D annotations are not available. It produces smoothand physically principled 3D motions in an interactive frame rate in a widevariety of challenging scenes, including newly recorded ones. Its advantagesare especially noticeable on in-the-wild sequences that significantly differfrom common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP.Qualitative results are available athttp://gvv.mpi-inf.mpg.de/projects/PhysAware/