Image-to-image translation aims to learn the mapping between two visualdomains. There are two main challenges for many applications: 1) the lack ofaligned training pairs and 2) multiple possible outputs from a single inputimage. In this work, we present an approach based on disentangledrepresentation for producing diverse outputs without paired training images. Toachieve diversity, we propose to embed images onto two spaces: adomain-invariant content space capturing shared information across domains anda domain-specific attribute space. Our model takes the encoded content featuresextracted from a given input and the attribute vectors sampled from theattribute space to produce diverse outputs at test time. To handle unpairedtraining data, we introduce a novel cross-cycle consistency loss based ondisentangled representations. Qualitative results show that our model cangenerate diverse and realistic images on a wide range of tasks without pairedtraining data. For quantitative comparisons, we measure realism with user studyand diversity with a perceptual distance metric. We apply the proposed model todomain adaptation and show competitive performance when compared to thestate-of-the-art on the MNIST-M and the LineMod datasets.