Abstract
Personalized image generation holds great promise in assisting humans ineveryday work and life due to its impressive function in creatively generatingpersonalized content. However, current evaluations either are automated butmisalign with humans or require human evaluations that are time-consuming andexpensive. In this work, we present DreamBench++, a human-aligned benchmarkautomated by advanced multimodal GPT models. Specifically, we systematicallydesign the prompts to let GPT be both human-aligned and self-aligned, empoweredwith task reinforcement. Further, we construct a comprehensive datasetcomprising diverse images and prompts. By benchmarking 7 modern generativemodels, we demonstrate that DreamBench++ results in significantly morehuman-aligned evaluation, helping boost the community with innovative findings.