Abstract
Though tremendous strides have been made in uncontrolled face detection,accurate and efficient face localisation in the wild remains an open challenge.This paper presents a robust single-stage face detector, named RetinaFace,which performs pixel-wise face localisation on various scales of faces bytaking advantages of joint extra-supervised and self-supervised multi-tasklearning. Specifically, We make contributions in the following five aspects:(1) We manually annotate five facial landmarks on the WIDER FACE dataset andobserve significant improvement in hard face detection with the assistance ofthis extra supervision signal. (2) We further add a self-supervised meshdecoder branch for predicting a pixel-wise 3D shape face information inparallel with the existing supervised branches. (3) On the WIDER FACE hard testset, RetinaFace outperforms the state of the art average precision (AP) by$1.1\%$ (achieving AP equal to {\bf $91.4\%$}). (4) On the IJB-C test set,RetinaFace enables state of the art methods (ArcFace) to improve their resultsin face verification (TAR=$89.59\%$ for FAR=1e-6). (5) By employinglight-weight backbone networks, RetinaFace can run real-time on a single CPUcore for a VGA-resolution image. Extra annotations and code will be released tofacilitate future research.