Abstract
Unsupervised image-to-image translation techniques are able to map localtexture between two domains, but they are typically unsuccessful when thedomains require larger shape change. Inspired by semantic segmentation, weintroduce a discriminator with dilated convolutions that is able to useinformation from across the entire image to train a more context-awaregenerator. This is coupled with a multi-scale perceptual loss that is betterable to represent error in the underlying shape of objects. We demonstrate thatthis design is more capable of representing shape deformation in a challengingtoy dataset, plus in complex mappings with significant dataset variationbetween humans, dolls, and anime faces, and between cats and dogs.
Quick Read (beta)
loading the full paper ...