Abstract
In spite of the recent progress, image diffusion models still produceartifacts. A common solution is to refine an established model with a qualityassessment system, which generally rates an image in its entirety. In thiswork, we believe problem-solving starts with identification, yielding therequest that the model should be aware of not just the presence of defects inan image, but their specific locations. Motivated by this, we proposeDiffDoctor, a two-stage pipeline to assist image diffusion models in generatingfewer artifacts. Concretely, the first stage targets developing a robustartifact detector, for which we collect a dataset of over 1M flawed synthesizedimages and set up an efficient human-in-the-loop annotation process,incorporating a carefully designed class-balance strategy. The learned artifactdetector is then involved in the second stage to tune the diffusion modelthrough assigning a per-pixel confidence map for each synthesis. Extensiveexperiments on text-to-image diffusion models demonstrate the effectiveness ofour artifact detector as well as the soundness of our diagnose-then-treatdesign.