Towards Dense People Detection with Deep Learning and Depth images

Abstract

This paper proposes a DNN-based system that detects multiple people from asingle depth image. Our neural network processes a depth image and outputs alikelihood map in image coordinates, where each detection corresponds to aGaussian-shaped local distribution, centered at the person's head. Thelikelihood map encodes both the number of detected people and their 2D imagepositions, and can be used to recover the 3D position of each person using thedepth image and the camera calibration parameters. Our architecture is compact,using separated convolutions to increase performance, and runs in real-timewith low budget GPUs. We use simulated data for initially training the network,followed by fine tuning with a relatively small amount of real data. We showthis strategy to be effective, producing networks that generalize to work withscenes different from those used during training. We thoroughly compare ourmethod against the existing state-of-the-art, including both classical andDNN-based solutions. Our method outperforms existing methods and can accuratelydetect people in scenes with significant occlusions.

Quick Read (beta)

loading the full paper ...