An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification

Abstract

Classification of malware families is crucial for a comprehensiveunderstanding of how they can infect devices, computers, or systems. Thus,malware identification enables security researchers and incident responders totake precautions against malware and accelerate mitigation. API call sequencesmade by malware are widely utilized features by machine and deep learningmodels for malware classification as these sequences represent the behavior ofmalware. However, traditional machine and deep learning models remain incapableof capturing sequence relationships between API calls. On the other hand, thetransformer-based models process sequences as a whole and learn relationshipsbetween API calls due to multi-head attention mechanisms and positionalembeddings. Our experiments demonstrate that the transformer model with onetransformer block layer surpassed the widely used base architecture, LSTM.Moreover, BERT or CANINE, pre-trained transformer models, outperformed inclassifying highly imbalanced malware families according to evaluation metrics,F1-score, and AUC score. Furthermore, the proposed bagging-based randomtransformer forest (RTF), an ensemble of BERT or CANINE, has reached thestate-of-the-art evaluation scores on three out of four datasets, particularlystate-of-the-art F1-score of 0.6149 on one of the commonly used benchmarkdataset.

Quick Read (beta)

loading the full paper ...