Abstract
Deep convolutional networks (CNNs) have exhibited their potential in imageinpainting for producing plausible results. However, in most existing methods,e.g., context encoder, the missing parts are predicted by propagating thesurrounding convolutional features through a fully connected layer, whichintends to produce semantically plausible but blurry result. In this paper, weintroduce a special shift-connection layer to the U-Net architecture, namelyShift-Net, for filling in missing regions of any shape with sharp structuresand fine-detailed textures. To this end, the encoder feature of the knownregion is shifted to serve as an estimation of the missing parts. A guidanceloss is introduced on decoder feature to minimize the distance between thedecoder feature after fully connected layer and the ground-truth encoderfeature of the missing parts. With such constraint, the decoder feature inmissing region can be used to guide the shift of encoder feature in knownregion. An end-to-end learning algorithm is further developed to train theShift-Net. Experiments on the Paris StreetView and Places datasets demonstratethe efficiency and effectiveness of our Shift-Net in producing sharper,fine-detailed, and visually plausible results. The codes and pre-trained modelsare available at https://github.com/Zhaoyi-Yan/Shift-Net.