Abstract
In this paper, we introduce the idea of using adversarially-generated samplesof the input images that were classified as deepfakes by a detector, to formperturbation masks for inferring the importance of different input features andproduce visual explanations. We generate these samples based on NaturalEvolution Strategies, aiming to flip the original deepfake detector's decisionand classify these samples as real. We apply this idea to fourperturbation-based explanation methods (LIME, SHAP, SOBOL and RISE) andevaluate the performance of the resulting modified methods using a SOTAdeepfake detection model, a benchmarking dataset (FaceForensics++) and acorresponding explanation evaluation framework. Our quantitative assessmentsdocument the mostly positive contribution of the proposed perturbation approachin the performance of explanation methods. Our qualitative analysis shows thecapacity of the modified explanation methods to demarcate the manipulated imageregions more accurately, and thus to provide more useful explanations.