Abstract
We propose the task of forecasting characteristic 3D poses: from a monocularvideo observation of a person, to predict a future 3D pose of that person in alikely action-defining, characteristic pose - for instance, from observing aperson reaching for a banana, predict the pose of the person eating the banana.Prior work on human motion prediction estimates future poses at fixed timeintervals. Although easy to define, this frame-by-frame formulation confoundstemporal and intentional aspects of human action. Instead, we define asemantically meaningful pose prediction task that decouples the predicted posefrom time, taking inspiration from goal-directed behavior. To predictcharacteristic poses, we propose a probabilistic approach that first models thepossible multi-modality in the distribution of likely characteristic poses. Itthen samples future pose hypotheses from the predicted distribution in anautoregressive fashion to model dependencies between joints and finallyoptimizes the resulting pose with bone length and angle constraints. Toevaluate our method, we construct a dataset of manually annotatedcharacteristic 3D poses. Our experiments with this dataset suggest that ourproposed probabilistic approach outperforms state-of-the-art methods by 22% onaverage.