Abstract
Recently, extensive research on image customization (e.g., identity, subject,style, background, etc.) demonstrates strong customization capabilities inlarge-scale generative models. However, most approaches are designed forspecific tasks, restricting their generalizability to combine different typesof condition. Developing a unified framework for image customization remains anopen challenge. In this paper, we present DreamO, an image customizationframework designed to support a wide range of tasks while facilitating seamlessintegration of multiple conditions. Specifically, DreamO utilizes a diffusiontransformer (DiT) framework to uniformly process input of different types.During training, we construct a large-scale training dataset that includesvarious customization tasks, and we introduce a feature routing constraint tofacilitate the precise querying of relevant information from reference images.Additionally, we design a placeholder strategy that associates specificplaceholders with conditions at particular positions, enabling control over theplacement of conditions in the generated results. Moreover, we employ aprogressive training strategy consisting of three stages: an initial stagefocused on simple tasks with limited data to establish baseline consistency, afull-scale training stage to comprehensively enhance the customizationcapabilities, and a final quality alignment stage to correct quality biasesintroduced by low-quality data. Extensive experiments demonstrate that theproposed DreamO can effectively perform various image customization tasks withhigh quality and flexibly integrate different types of control conditions.