Reinforcement Learning under Threats

Abstract

In several reinforcement learning (RL) scenarios, mainly in securitysettings, there may be adversaries trying to interfere with the rewardgenerating process. In this paper, we introduce Threatened Markov DecisionProcesses (TMDPs), which provide a framework to support a decision makeragainst a potential adversary in RL. Furthermore, we propose a level-$k$thinking scheme resulting in a new learning framework to deal with TMDPs. Afterintroducing our framework and deriving theoretical results, relevant empiricalevidence is given via extensive experiments, showing the benefits of accountingfor adversaries while the agent learns.

Quick Read (beta)

loading the full paper ...