DreamO: A Unified Framework for Image Customization

Abstract

Recently, extensive research on image customization (e.g., identity, subject,style, background, etc.) demonstrates strong customization capabilities inlarge-scale generative models. However, most approaches are designed forspecific tasks, restricting their generalizability to combine different typesof condition. Developing a unified framework for image customization remains anopen challenge. In this paper, we present DreamO, an image customizationframework designed to support a wide range of tasks while facilitating seamlessintegration of multiple conditions. Specifically, DreamO utilizes a diffusiontransformer (DiT) framework to uniformly process input of different types.During training, we construct a large-scale training dataset that includesvarious customization tasks, and we introduce a feature routing constraint tofacilitate the precise querying of relevant information from reference images.Additionally, we design a placeholder strategy that associates specificplaceholders with conditions at particular positions, enabling control over theplacement of conditions in the generated results. Moreover, we employ aprogressive training strategy consisting of three stages: an initial stagefocused on simple tasks with limited data to establish baseline consistency, afull-scale training stage to comprehensively enhance the customizationcapabilities, and a final quality alignment stage to correct quality biasesintroduced by low-quality data. Extensive experiments demonstrate that theproposed DreamO can effectively perform various image customization tasks withhigh quality and flexibly integrate different types of control conditions.

Quick Read (beta)

loading the full paper ...