Abstract
Person re-identification aims to identify a specific person at distinct timesand locations. It is challenging because of occlusion, illumination, andviewpoint change in camera views. Recently, multi-shot person re-id taskreceives more attention since it is closer to real-world application. A keypoint of a good algorithm for multi-shot person re-id is the temporalaggregation of the person appearance features. While most of the currentapproaches apply pooling strategies and obtain a fixed-size vectorrepresentation, these may lose the matching evidence between examples. In thiswork, we propose the idea of visual distributional representation, whichinterprets an image set as samples drawn from an unknown distribution inappearance feature space. Based on the supervision signals from a downstreamtask of interest, the method reshapes the appearance feature space and furtherlearns the unknown distribution of each image set. In the context of multi-shotperson re-id, we apply this novel concept along with Wasserstein distance andlearn a distributional set distance function between two image sets. In thisway, the proper alignment between two image sets can be discovered naturally ina non-parametric manner. Our experiment results on two public datasets show theadvantages of our proposed method compared to other state-of-the-artapproaches.