Disentangled Person Image Generation

Abstract

Generating novel, yet realistic, images of persons is a challenging task dueto the complex interplay between the different image factors, such as theforeground, background and pose information. In this work, we aim at generatingsuch images based on a novel, two-stage reconstruction pipeline that learns adisentangled representation of the aforementioned image factors and generatesnovel person images at the same time. First, a multi-branched reconstructionnetwork is proposed to disentangle and encode the three factors into embeddingfeatures, which are then combined to re-compose the input image itself. Second,three corresponding mapping functions are learned in an adversarial manner inorder to map Gaussian noise to the learned embedding feature space, for eachfactor respectively. Using the proposed framework, we can manipulate theforeground, background and pose of the input image, and also sample newembedding features to generate such targeted manipulations, that provide morecontrol over the generation process. Experiments on Market-1501 and Deepfashiondatasets show that our model does not only generate realistic person imageswith new foregrounds, backgrounds and poses, but also manipulates the generatedfactors and interpolates the in-between states. Another set of experiments onMarket-1501 shows that our model can also be beneficial for the personre-identification task.

Quick Read (beta)

loading the full paper ...