Abstract
Image translation methods typically aim to manipulate a set of labeledattributes (given as supervision at training time e.g. domain label) whileleaving the unlabeled attributes intact. Current methods achieve either: (i)disentanglement, which exhibits low visual fidelity and can only be satisfiedwhere the attributes are perfectly uncorrelated. (ii) visually-plausibletranslations, which are clearly not disentangled. In this work, we proposeOverLORD, a single framework for disentangling labeled and unlabeled attributesas well as synthesizing high-fidelity images, which is composed of two stages;(i) Disentanglement: Learning disentangled representations with latentoptimization. Differently from previous approaches, we do not rely onadversarial training or any architectural biases. (ii) Synthesis: Trainingfeed-forward encoders for inferring the learned attributes and tuning thegenerator in an adversarial manner to increase the perceptual quality. When thelabeled and unlabeled attributes are correlated, we model an additionalrepresentation that accounts for the correlated attributes and improvesdisentanglement. We highlight that our flexible framework covers multiple imagetranslation settings e.g. attribute manipulation, pose-appearance translation,segmentation-guided synthesis and shape-texture transfer. In an extensiveevaluation, we present significantly better disentanglement with highertranslation quality and greater output diversity than state-of-the-art methods.