We present a novel method for inserting objects, specifically humans, intoexisting images, such that they blend in a photorealistic manner, whilerespecting the semantic context of the scene. Our method involves threesubnetworks: the first generates the semantic map of the new person, given thepose of the other persons in the scene and an optional bounding boxspecification. The second network renders the pixels of the novel person andits blending mask, based on specifications in the form of multiple appearancecomponents. A third network refines the generated face in order to match thoseof the target person. Our experiments present convincing high-resolutionoutputs in this novel and challenging application domain. In addition, thethree networks are evaluated individually, demonstrating for example, state ofthe art results in pose transfer benchmarks.