MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation

Abstract

Learning-based methods are believed to work well for unconstrained gazeestimation, i.e. gaze estimation from a monocular RGB camera withoutassumptions regarding user, environment, or camera. However, current gazedatasets were collected under laboratory conditions and methods were notevaluated across multiple datasets. Our work makes three contributions towardsaddressing these limitations. First, we present the MPIIGaze that contains213,659 full face images and corresponding ground-truth gaze positionscollected from 15 users during everyday laptop use over several months. Anexperience sampling approach ensured continuous gaze and head poses andrealistic variation in eye appearance and illumination. To facilitatecross-dataset evaluations, 37,667 images were manually annotated with eyecorners, mouth corners, and pupil centres. Second, we present an extensiveevaluation of state-of-the-art gaze estimation methods on three currentdatasets, including MPIIGaze. We study key challenges including target gazerange, illumination conditions, and facial appearance variation. We show thatimage resolution and the use of both eyes affect gaze estimation performancewhile head pose and pupil centre information are less informative. Finally, wepropose GazeNet, the first deep appearance-based gaze estimation method.GazeNet improves the state of the art by 22% percent (from a mean error of 13.9degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.

Quick Read (beta)

loading the full paper ...