We present a new method for synthesizing high-resolution photo-realisticimages from semantic label maps using conditional generative adversarialnetworks (conditional GANs). Conditional GANs have enabled a variety ofapplications, but the results are often limited to low-resolution and still farfrom realistic. In this work, we generate 2048x1024 visually appealing resultswith a novel adversarial loss, as well as new multi-scale generator anddiscriminator architectures. Furthermore, we extend our framework tointeractive visual manipulation with two additional features. First, weincorporate object instance segmentation information, which enables objectmanipulations such as removing/adding objects and changing the object category.Second, we propose a method to generate diverse results given the same input,allowing users to edit the object appearance interactively. Human opinionstudies demonstrate that our method significantly outperforms existing methods,advancing both the quality and the resolution of deep image synthesis andediting.