Abstract
This paper addresses the problem of selecting appearance features formultiple object tracking (MOT) in urban scenes. Over the years, a large numberof features has been used for MOT. However, it is not clear whether some ofthem are better than others. Commonly used features are color histograms,histograms of oriented gradients, deep features from convolutional neuralnetworks and re-identification (ReID) features. In this study, we assess howgood these features are at discriminating objects enclosed by a bounding box inurban scene tracking scenarios. Several affinity measures, namely the$\mathrm{L}_1$, $\mathrm{L}_2$ and the Bhattacharyya distances, Rank-1 countsand the cosine similarity, are also assessed for their impact on thediscriminative power of the features. Results on several datasets show thatfeatures from ReID networks are the best for discriminating instances from oneanother regardless of the quality of the detector. If a ReID model is notavailable, color histograms may be selected if the detector has a good recalland there are few occlusions; otherwise, deep features are more robust todetectors with lower recall. The project page ishttp://www.mehdimiah.com/visual_features.