Adversarial Attacks to Reward Machine-based Reinforcement Learning

Abstract

In recent years, Reward Machines (RMs) have stood out as a simple yeteffective automata-based formalism for exposing and exploiting task structurein reinforcement learning settings. Despite their relevance, little to noattention has been directed to the study of their security implications androbustness to adversarial scenarios, likely due to their recent appearance inthe literature. With my thesis, I aim to provide the first analysis of thesecurity of RM-based reinforcement learning techniques, with the hope ofmotivating further research in the field, and I propose and evaluate a novelclass of attacks on RM-based techniques: blinding attacks.

Quick Read (beta)

loading the full paper ...