Unsupervised Hard Example Mining from Videos for Improved Object Detection

Abstract

Important gains have recently been obtained in object detection by usingtraining objectives that focus on {\em hard negative} examples, i.e., negativeexamples that are currently rated as positive or ambiguous by the detector.These examples can strongly influence parameters when the network is trained tocorrect them. Unfortunately, they are often sparse in the training data, andare expensive to obtain. In this work, we show how large numbers of hardnegatives can be obtained {\em automatically} by analyzing the output of atrained detector on video sequences. In particular, detections that are {\emisolated in time}, i.e., that have no associated preceding or followingdetections, are likely to be hard negatives. We describe simple procedures formining large numbers of such hard negatives (and also hard {\em positives})from unlabeled video data. Our experiments show that retraining detectors onthese automatically obtained examples often significantly improves performance.We present experiments on multiple architectures and multiple data sets,including face detection, pedestrian detection and other object categories.

Quick Read (beta)

loading the full paper ...