Abstract
Despite their impressive visual fidelity, existing personalized generativemodels lack interactive control over spatial composition and scale poorly tomultiple subjects. To address these limitations, we present LayerComposer, aninteractive framework for personalized, multi-subject text-to-image generation.Our approach introduces two main contributions: (1) a layered canvas, a novelrepresentation in which each subject is placed on a distinct layer, enablingocclusion-free composition; and (2) a locking mechanism that preserves selectedlayers with high fidelity while allowing the remaining layers to adapt flexiblyto the surrounding context. Similar to professional image-editing software, theproposed layered canvas allows users to place, resize, or lock input subjectsthrough intuitive layer manipulation. Our versatile locking mechanism requiresno architectural changes, relying instead on inherent positional embeddingscombined with a new complementary data sampling strategy. Extensive experimentsdemonstrate that LayerComposer achieves superior spatial control and identitypreservation compared to the state-of-the-art methods in multi-subjectpersonalized image generation.