Universal Prompt Optimizer for Safe Text-to-Image Generation

Abstract

Text-to-Image (T2I) models have shown great performance in generating imagesbased on textual prompts. However, these models are vulnerable to unsafe inputto generate unsafe content like sexual, harassment and illegal-activity images.Existing studies based on image checker, model fine-tuning and embeddingblocking are impractical in real-world applications. Hence, \textit{we proposethe first universal prompt optimizer for safe T2I generation in black-boxscenario}. We first construct a dataset consisting of toxic-clean prompt pairsby GPT-3.5 Turbo. To guide the optimizer to have the ability of convertingtoxic prompt to clean prompt while preserving semantic information, we design anovel reward function measuring toxicity and text alignment of generated imagesand train the optimizer through Proximal Policy Optimization. Experiments showthat our approach can effectively reduce the likelihood of various T2I modelsin generating inappropriate images, with no significant impact on textalignment. It is also flexible to be combined with methods to achieve betterperformance.

Quick Read (beta)

loading the full paper ...