Abstract
Recent work in reinforcement learning has focused on several characteristicsof learned policies that go beyond maximizing reward. These properties includefairness, explainability, generalization, and robustness. In this paper, wedefine interventional robustness (IR), a measure of how much variability isintroduced into learned policies by incidental aspects of the trainingprocedure, such as the order of training data or the particular exploratoryactions taken by agents. A training procedure has high IR when the agents itproduces take very similar actions under intervention, despite variation inthese incidental aspects of the training procedure. We develop an intuitive,quantitative measure of IR and calculate it for eight algorithms in three Atarienvironments across dozens of interventions and states. From these experiments,we find that IR varies with the amount of training and type of algorithm andthat high performance does not imply high IR, as one might expect.