Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model

Abstract

Beyond the superiority of the text-to-image diffusion model in generatinghigh-quality images, recent studies have attempted to uncover its potential foradapting the learned semantic knowledge to visual perception tasks. In thiswork, instead of translating a generative diffusion model into a visualperception model, we explore to retain the generative ability with theperceptive adaptation. To accomplish this, we present Zippo, a unifiedframework for zipping the color and transparency distributions into a singlediffusion model by expanding the diffusion latent into a joint representationof RGB images and alpha mattes. By alternatively selecting one modality as thecondition and then applying the diffusion process to the counterpart modality,Zippo is capable of generating RGB images from alpha mattes and predictingtransparency from input images. In addition to single-modality prediction, wepropose a modality-aware noise reassignment strategy to further empower Zippowith jointly generating RGB images and its corresponding alpha mattes under thetext guidance. Our experiments showcase Zippo's ability of efficienttext-conditioned transparent image generation and present plausible results ofMatte-to-RGB and RGB-to-Matte translation.

Quick Read (beta)

loading the full paper ...