Deep Reinforcement Learning for Active Human Pose Estimation

Abstract

Most 3d human pose estimation methods assume that input -- be it images of ascene collected from one or several viewpoints, or from a video -- is given.Consequently, they focus on estimates leveraging prior knowledge andmeasurement by fusing information spatially and/or temporally, wheneveravailable. In this paper we address the problem of an active observer withfreedom to move and explore the scene spatially -- in `time-freeze' mode --and/or temporally, by selecting informative viewpoints that improve itsestimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainabledeep reinforcement learning-based active pose estimation architecture whichlearns to select appropriate views, in space and time, to feed an underlyingmonocular pose estimator. We evaluate our model using single- and multi-targetestimators with strong result in both settings. Our system further learnsautomatic stopping conditions in time and transition functions to the nexttemporal processing step in videos. In extensive experiments with the Panopticmulti-view setup, and for complex scenes containing multiple people, we showthat our model learns to select viewpoints that yield significantly moreaccurate pose estimates compared to strong multi-view baselines.

Quick Read (beta)

loading the full paper ...