Abstract
Fuzzing is a commonly used technique designed to test software byautomatically crafting program inputs. Currently, the most successful fuzzingalgorithms emphasize simple, low-overhead strategies with the ability toefficiently monitor program state during execution. Through compile-timeinstrumentation, these approaches have access to numerous aspects of programstate including coverage, data flow, and heterogeneous fault detection andclassification. However, existing approaches utilize blind random mutationstrategies when generating test inputs. We present a different approach thatuses this state information to optimize mutation operators using reinforcementlearning (RL). By integrating OpenAI Gym with libFuzzer we are able tosimultaneously leverage advancements in reinforcement learning as well asfuzzing to achieve deeper coverage across several varied benchmarks. Ourtechnique connects the rich, efficient program monitors provided by LLVMSantizers with a deep neural net to learn mutation selection strategiesdirectly from the input data. The cross-language, asynchronous architecture wedeveloped enables us to apply any OpenAI Gym compatible deep reinforcementlearning algorithm to any fuzzing problem with minimal slowdown.