Abstract
Photo-realistic re-rendering of a human from a single image with explicitcontrol over body pose, shape and appearance enables a wide range ofapplications, such as human appearance transfer, virtual try-on, motionimitation, and novel view synthesis. While significant progress has been madein this direction using learning-based image generation tools, such as GANs,existing approaches yield noticeable artefacts such as blurring of finedetails, unrealistic distortions of the body parts and garments as well assevere changes of the textures. We, therefore, propose a new method forsynthesising photo-realistic human images with explicit control over pose andpart-based appearance, i.e., StylePoseGAN, where we extend a non-controllablegenerator to accept conditioning of pose and appearance separately. Our networkcan be trained in a fully supervised way with human images to disentangle pose,appearance and body parts, and it significantly outperforms existing singleimage re-rendering methods. Our disentangled representation opens up furtherapplications such as garment transfer, motion transfer, virtual try-on, head(identity) swap and appearance interpolation. StylePoseGAN achievesstate-of-the-art image generation fidelity on common perceptual metricscompared to the current best-performing methods and convinces in acomprehensive user study.