Online Nonstochastic Model-Free Reinforcement Learning

Abstract

In this work, we explore robust model-free reinforcement learning algorithmsfor environments that may be dynamic or even adversarial. Conventionalstate-based policies fail to accommodate the challenge imposed by the presenceof unmodeled disturbances in such settings. Additionally, optimizing linearstate-based policies pose obstacle for efficient optimization, leading tononconvex objectives even in benign environments like linear dynamical systems. Drawing inspiration from recent advancements in model-based control, weintroduce a novel class of policies centered on disturbance signals. We defineseveral categories of these signals, referred to as pseudo-disturbances, andcorresponding policy classes based on them. We provide efficient and practicalalgorithms for optimizing these policies. Next, we examine the task of online adaptation of reinforcement learningagents to adversarial disturbances. Our methods can be integrated with anyblack-box model-free approach, resulting in provable regret guarantees if theunderlying dynamics is linear. We evaluate our method over different standardRL benchmarks and demonstrate improved robustness.

Quick Read (beta)

loading the full paper ...