Abstract
TV shows depict a wide variety of human behaviors and have been studiedextensively for their potential to be a rich source of data for manyapplications. However, the majority of the existing work focuses on 2Drecognition tasks. In this paper, we make the observation that there is acertain persistence in TV shows, i.e., repetition of the environments and thehumans, which makes possible the 3D reconstruction of this content. Building onthis insight, we propose an automatic approach that operates on an entireseason of a TV show and aggregates information in 3D; we build a 3D model ofthe environment, compute camera information, static 3D scene structure and bodyscale information. Then, we demonstrate how this information acts as rich 3Dcontext that can guide and improve the recovery of 3D human pose and positionin these environments. Moreover, we show that reasoning about humans and theirenvironment in 3D enables a broad range of downstream applications:re-identification, gaze estimation, cinematography and image editing. We applyour approach on environments from seven iconic TV shows and perform anextensive evaluation of the proposed system.