On Designing Machine Learning Models for Malicious Network Traffic Classification

Abstract

Machine learning (ML) started to become widely deployed in cyber securitysettings for shortening the detection cycle of cyber attacks. To date, mostML-based systems are either proprietary or make specific choices of featurerepresentations and machine learning models. The success of these techniques isdifficult to assess as public benchmark datasets are currently unavailable. Inthis paper, we provide concrete guidelines and recommendations for usingsupervised ML in cyber security. As a case study, we consider the problem ofbotnet detection from network traffic data. Among our findings we highlightthat: (1) feature representations should take into consideration attackcharacteristics; (2) ensemble models are well-suited to handle class imbalance;(3) the granularity of ground truth plays an important role in the success ofthese methods.

Quick Read (beta)

loading the full paper ...