With multiple crowd gatherings of millions of people every year in eventsranging from pilgrimages to protests, concerts to marathons, and festivals tofunerals; visual crowd analysis is emerging as a new frontier in computervision. In particular, counting in highly dense crowds is a challenging problemwith far-reaching applicability in crowd safety and management, as well asgauging political significance of protests and demonstrations. In this paper,we propose a novel approach that simultaneously solves the problems ofcounting, density map estimation and localization of people in a given densecrowd image. Our formulation is based on an important observation that thethree problems are inherently related to each other making the loss functionfor optimizing a deep CNN decomposable. Since localization requireshigh-quality images and annotations, we introduce UCF-QNRF dataset thatovercomes the shortcomings of previous datasets, and contains 1.25 millionhumans manually marked with dot annotations. Finally, we present evaluationmeasures and comparison with recent deep CNN networks, including thosedeveloped specifically for crowd counting. Our approach significantlyoutperforms state-of-the-art on the new dataset, which is the most challengingdataset with the largest number of crowd annotations in the most diverse set ofscenes.