StyleFusion: A Generative Model for Disentangling Spatial Segments

Abstract

We present StyleFusion, a new mapping architecture for StyleGAN, which takesas input a number of latent codes and fuses them into a single style code.Inserting the resulting style code into a pre-trained StyleGAN generatorresults in a single harmonized image in which each semantic region iscontrolled by one of the input latent codes. Effectively, StyleFusion yields adisentangled representation of the image, providing fine-grained control overeach region of the generated image. Moreover, to help facilitate global controlover the generated image, a special input latent code is incorporated into thefused representation. StyleFusion operates in a hierarchical manner, where eachlevel is tasked with learning to disentangle a pair of image regions (e.g., thecar body and wheels). The resulting learned disentanglement allows one tomodify both local, fine-grained semantics (e.g., facial features) as well asmore global features (e.g., pose and background), providing improvedflexibility in the synthesis process. As a natural extension, StyleFusionenables one to perform semantically-aware cross-image mixing of regions thatare not necessarily aligned. Finally, we demonstrate how StyleFusion can bepaired with existing editing techniques to more faithfully constrain the editto the user's region of interest.

Quick Read (beta)

loading the full paper ...