Abstract
In the network security arms race, the defender is significantlydisadvantaged as they need to successfully detect and counter every maliciousattack. In contrast, the attacker needs to succeed only once. To level theplaying field, we investigate the effectiveness of autonomous agents in arealistic network defence scenario. We first outline the problem, provide thebackground on reinforcement learning and detail our proposed agent design.Using a network environment simulation, with 13 hosts spanning 3 subnets, wetrain a novel reinforcement learning agent and show that it can reliably defendcontinual attacks by two advanced persistent threat (APT) red agents: one withcomplete knowledge of the network layout and another which must discoverresources through exploration but is more general.