Abstract
Security vulnerabilities play a vital role in network security system.Fuzzing technology is widely used as a vulnerability discovery technology toreduce damage in advance. However, traditional fuzzing techniques have manychallenges, such as how to mutate input seed files, how to increase codecoverage, and how to effectively bypass verification. Machine learningtechnology has been introduced as a new method into fuzzing test to alleviatethese challenges. This paper reviews the research progress of using machinelearning technology for fuzzing test in recent years, analyzes how machinelearning improve the fuzz process and results, and sheds light on future workin fuzzing. Firstly, this paper discusses the reasons why machine learningtechniques can be used for fuzzing scenarios and identifies six differentstages in which machine learning have been used. Then this paper systematicallystudy the machine learning based fuzzing models from selection of machinelearning algorithm, pre-processing methods, datasets, evaluation metrics, andhyperparameters setting. Next, this paper assesses the performance of themachine learning models based on the frequently used evaluation metrics. Theresults of the evaluation prove that machine learning technology has anacceptable capability of categorize predictive for fuzzing. Finally, thecomparison on capability of discovering vulnerabilities between traditionalfuzzing tools and machine learning based fuzzing tools is analyzed. The resultsdepict that the introduction of machine learning technology can improve theperformance of fuzzing. However, there are still some limitations, such asunbalanced training samples and difficult to extract the characteristicsrelated to vulnerabilities.