Existing deep models predict 2D and 3D kinematic poses from video that areapproximately accurate, but contain visible errors that violate physicalconstraints, such as feet penetrating the ground and bodies leaning at extremeangles. In this paper, we present a physics-based method for inferring 3D humanmotion from video sequences that takes initial 2D and 3D pose estimates asinput. We first estimate ground contact timings with a novel prediction networkwhich is trained without hand-labeled data. A physics-based trajectoryoptimization then solves for a physically-plausible motion, based on theinputs. We show this process produces motions that are significantly morerealistic than those from purely kinematic methods, substantially improvingquantitative measures of both kinematic and dynamic plausibility. Wedemonstrate our method on character animation and pose estimation tasks ondynamic motions of dancing and sports with complex contact patterns.