Abstract
We present LayerDiffusion, an approach enabling large-scale pretrained latentdiffusion models to generate transparent images. The method allows generationof single transparent images or of multiple transparent layers. The methodlearns a "latent transparency" that encodes alpha channel transparency into thelatent manifold of a pretrained latent diffusion model. It preserves theproduction-ready quality of the large diffusion model by regulating the addedtransparency as a latent offset with minimal changes to the original latentdistribution of the pretrained model. In this way, any latent diffusion modelcan be converted into a transparent image generator by finetuning it with theadjusted latent space. We train the model with 1M transparent image layer pairscollected using a human-in-the-loop collection scheme. We show that latenttransparency can be applied to different open source image generators, or beadapted to various conditional control systems to achieve applications likeforeground/background-conditioned layer generation, joint layer generation,structural control of layer contents, etc. A user study finds that in mostcases (97%) users prefer our natively generated transparent content overprevious ad-hoc solutions such as generating and then matting. Users alsoreport the quality of our generated transparent images is comparable to realcommercial transparent assets like Adobe Stock.