SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Abstract

Recent advancements in generative AI have made text-guided image inpainting -adding, removing, or altering image regions using textual prompts - widelyaccessible. However, generating semantically correct photorealistic imagery,typically requires carefully-crafted prompts and iterative refinement byevaluating the realism of the generated content - tasks commonly performed byhumans. To automate the generative process, we propose Semantically Aligned andUncertainty Guided AI Image Inpainting (SAGI), a model-agnostic pipeline, tosample prompts from a distribution that closely aligns with human perceptionand to evaluate the generated content and discard instances that deviate fromsuch a distribution, which we approximate using pretrained large languagemodels and vision-language models. By applying this pipeline on multiplestate-of-the-art inpainting models, we create the SAGI Dataset (SAGI-D),currently the largest and most diverse dataset of AI-generated inpaintings,comprising over 95k inpainted images and a human-evaluated subset. Ourexperiments show that semantic alignment significantly improves image qualityand aesthetics, while uncertainty guidance effectively identifies realisticmanipulations - human ability to distinguish inpainted images from real onesdrops from 74% to 35% in terms of accuracy, after applying our pipeline.Moreover, using SAGI-D for training several image forensic approaches increasesin-domain detection performance on average by 37.4% and out-of-domaingeneralization by 26.1% in terms of IoU, also demonstrating its utility incountering malicious exploitation of generative AI. Code and dataset areavailable at https://mever-team.github.io/SAGI/

Quick Read (beta)

loading the full paper ...