Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Abstract

We study episodic reinforcement learning under unknown adversarialcorruptions in both the rewards and the transition probabilities of theunderlying system. We propose new algorithms which, compared to the existingresults in (Lykouris et al., 2020), achieve strictly better regret bounds interms of total corruptions for the tabular setting. To be specific, firstly,our regret bounds depend on more precise numerical values of total rewardscorruptions and transition corruptions, instead of only on the total number ofcorrupted episodes. Secondly, our regret bounds are the first of their kind inthe reinforcement learning setting to have the number of corruptions show upadditively with respect to $\sqrt{T}$ rather than multiplicatively. Our resultsfollow from a general algorithmic framework that combines corruption-robustpolicy elimination meta-algorithms, and plug-in reward-free explorationsub-algorithms. Replacing the meta-algorithm or sub-algorithm may extend theframework to address other corrupted settings with potentially more structure.

Quick Read (beta)

loading the full paper ...