Emulating malware authors for proactive protection using GANs over a distributed image visualization of the dynamic file behavior

Abstract

Malware authors have always been at an advantage of being able toadversarially test and augment the malicious code, before deploying theirpayload, against anti-malware products at their disposal. The anti-malwaredevelopers and threat experts, on the other hand, do not have such a privilegeof tuning anti-malware products against zero-day attacks pro-actively. Thisallows the malware authors to being a step ahead of the anti-malware products,fundamentally biasing the cat and mouse game played by the two parties. In thispaper, we propose a way that would enable machine learning based threatprevention models to bridge that gap by being able to tune against a deepgenerative adversarial network (GAN), which takes up the role of a malwareauthor and generates new types of malware. The GAN is trained over a reversibledistributed RGB image representation of known malware behaviors, encoding thesequence of API call ngrams and the corresponding term frequencies. Thegenerated images represent synthetic malware that can be decoded back to theunderlying API call sequence information. The image representation is not onlydemonstrated as a general technique of incorporating necessary priors forexploiting convolutional neural network architectures for generative ordiscriminative modeling, but also as a visualization method for easy manualsoftware or malware categorization, by having individual API ngram informationdistributed across the image space. In addition, we also propose usingsmart-definitions for detecting malwares based on perceptual hashing of theseimages. Such hashes are potentially more effective than cryptographic hashesthat do not carry any meaningful similarity metric, and hence, do notgeneralize well.

Quick Read (beta)

loading the full paper ...