Abstract
We tackle human image synthesis, including human motion imitation, appearancetransfer, and novel view synthesis, within a unified framework. It means thatthe model, once being trained, can be used to handle all these tasks. Theexisting task-specific methods mainly use 2D keypoints to estimate the humanbody structure. However, they only express the position information with noabilities to characterize the personalized shape of the person and model thelimb rotations. In this paper, we propose to use a 3D body mesh recovery moduleto disentangle the pose and shape. It can not only model the joint location androtation but also characterize the personalized body shape. To preserve thesource information, such as texture, style, color, and face identity, wepropose an Attentional Liquid Warping GAN with Attentional Liquid Warping Block(AttLWB) that propagates the source information in both image and featurespaces to the synthesized reference. Specifically, the source features areextracted by a denoising convolutional auto-encoder for characterizing thesource identity well. Furthermore, our proposed method can support a moreflexible warping from multiple sources. To further improve the generalizationability of the unseen source images, a one/few-shot adversarial learning isapplied. In detail, it firstly trains a model in an extensive training set.Then, it finetunes the model by one/few-shot unseen image(s) in aself-supervised way to generate high-resolution (512 x 512 and 1024 x 1024)results. Also, we build a new dataset, namely iPER dataset, for the evaluationof human motion imitation, appearance transfer, and novel view synthesis.Extensive experiments demonstrate the effectiveness of our methods in terms ofpreserving face identity, shape consistency, and clothes details. All codes anddataset are available onhttps://impersonator.org/work/impersonator-plus-plus.html.