Entity-based Reinforcement Learning for Autonomous Cyber Defence

Abstract

A significant challenge for autonomous cyber defence is ensuring a defensiveagent's ability to generalise across diverse network topologies andconfigurations. This capability is necessary for agents to remain effectivewhen deployed in dynamically changing environments, such as an enterprisenetwork where devices may frequently join and leave. Standard approaches todeep reinforcement learning, where policies are parameterised using afixed-input multi-layer perceptron (MLP) expect fixed-size observation andaction spaces. In autonomous cyber defence, this makes it hard to developagents that generalise to environments with network topologies different fromthose trained on, as the number of nodes affects the natural size of theobservation and action spaces. To overcome this limitation, we reframe theproblem of autonomous network defence using entity-based reinforcementlearning, where the observation and action space of an agent are decomposedinto a collection of discrete entities. This framework enables the use ofpolicy parameterisations specialised in compositional generalisation. We traina Transformer-based policy on the Yawning Titan cyber-security simulationenvironment and test its generalisation capabilities across various networktopologies. We demonstrate that this approach significantly outperforms anMLP-based policy when training across fixed-size networks of varyingtopologies, and matches performance when training on a single network. We alsodemonstrate the potential for zero-shot generalisation to networks of adifferent size to those seen in training. These findings highlight thepotential for entity-based reinforcement learning to advance the field ofautonomous cyber defence by providing more generalisable policies capable ofhandling variations in real-world network environments.

Quick Read (beta)

loading the full paper ...