Using Fully Convolutional Neural Networks to detect manipulated images in videos

Abstract

We propose a compact architecture based on fully convolutional neuralnetworks (FCN) to detect manipulated images of human faces. In contrast toexisting FCN architectures for classification, here the final layer feature mapexhibits large spatial dimensions with non-global receptive field. The finallayer features are spatially averaged using global average pooling (GAP) toprovide more robust features. We leverage the structure of the FCN to derive astraightforward way for joint classification and forgery localization trainingand show that the network's classification performance improves significantlyby the addition of a pixelwise classification loss. The trained networksachieve state of the art results in binary classification in the {\itFaceForensics++} dataset and competitive performance in other tasks using asignificantly reduced number of parameters and small resolution input images.Additionally, we examine how well the proposed architecture can detect fullygenerated images using faces from the recently proposed PGAN and StyleGANmethods. We show that this task is easier to learn than detecting manipulatedimages and that for both cases there is only a small drop of performance whenthe network is trained using more than one manipulation technique in thetraining data.

Quick Read (beta)

loading the full paper ...