Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization

Abstract

In reinforcement learning (RL), an autonomous agent learns to perform complextasks by maximizing an exogenous reward signal while interacting with itsenvironment. In real-world applications, test conditions may differsubstantially from the training scenario and, therefore, focusing on purereward maximization during training may lead to poor results at test time. Inthese cases, it is important to trade-off between performance and robustnesswhile learning a policy. While several results exist for robust, model-basedRL, the model-free case has not been widely investigated. In this paper, wecast the robust, model-free RL problem as a multi-objective optimizationproblem. To quantify the robustness of a policy, we use delay margin and gainmargin, two robustness indicators that are common in control theory. We showhow these metrics can be estimated from data in the model-free setting. We usemulti-objective Bayesian optimization (MOBO) to solve efficiently thisexpensive-to-evaluate, multi-objective optimization problem. We show thebenefits of our robust formulation both in sim-to-real and pure hardwareexperiments to balance a Furuta pendulum.

Quick Read (beta)

loading the full paper ...