Exploiting Language Model for Efficient Linguistic Steganalysis: An Empirical Study

Abstract

Recent advances in linguistic steganalysis have successively applied CNNs,RNNs, GNNs and other deep learning models for detecting secret information ingenerative texts. These methods tend to seek stronger feature extractors toachieve higher steganalysis effects. However, we have found through experimentsthat there actually exists significant difference between automaticallygenerated steganographic texts and carrier texts in terms of the conditionalprobability distribution of individual words. Such kind of statisticaldifference can be naturally captured by the language model used for generatingsteganographic texts, which drives us to give the classifier a priori knowledgeof the language model to enhance the steganalysis ability. To this end, wepresent two methods to efficient linguistic steganalysis in this paper. One isto pre-train a language model based on RNN, and the other is to pre-train asequence autoencoder. Experimental results show that the two methods havedifferent degrees of performance improvement when compared to the randomlyinitialized RNN classifier, and the convergence speed is significantlyaccelerated. Moreover, our methods have achieved the best detection results.

Quick Read (beta)

loading the full paper ...