A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Abstract

Achieving high-quality versatile image inpainting, where user-specifiedregions are filled with plausible content according to user intent, presents asignificant challenge. Existing methods face difficulties in simultaneouslyaddressing context-aware image inpainting and text-guided object inpainting dueto the distinct optimal training strategies required. To overcome thischallenge, we introduce PowerPaint, the first high-quality and versatileinpainting model that excels in both tasks. First, we introduce learnable taskprompts along with tailored fine-tuning strategies to guide the model's focuson different inpainting targets explicitly. This enables PowerPaint toaccomplish various inpainting tasks by utilizing different task prompts,resulting in state-of-the-art performance. Second, we demonstrate theversatility of the task prompt in PowerPaint by showcasing its effectiveness asa negative prompt for object removal. Additionally, we leverage promptinterpolation techniques to enable controllable shape-guided object inpainting.Finally, we extensively evaluate PowerPaint on various inpainting benchmarks todemonstrate its superior performance for versatile image inpainting. We releaseour codes and models on our project page: https://powerpaint.github.io/.

Quick Read (beta)

loading the full paper ...