Abstract
Diffusion models have made remarkable advancements in generating high-qualityimages from textual descriptions. Recent works like LayerDiffuse have extendedthe previous single-layer, unified image generation paradigm to transparentimage layer generation. However, existing multi-layer generation methods failto handle the interactions among multiple layers such as rational globallayout, physics-plausible contacts and visual effects like shadows andreflections while maintaining high alpha quality. To solve this problem, wepropose PSDiffusion, a unified diffusion framework for simultaneous multi-layertext-to-image generation. Our model can automatically generate multi-layerimages with one RGB background and multiple RGBA foregrounds through a singlefeed-forward process. Unlike existing methods that combine multiple tools forpost-decomposition or generate layers sequentially and separately, our methodintroduces a global-layer interactive mechanism that generates layered-imagesconcurrently and collaboratively, ensuring not only high quality andcompleteness for each layer, but also spatial and visual interactions amonglayers for global coherence.