Provably Robust Blackbox Optimization for Reinforcement Learning

  • 2019-07-08 12:30:07
  • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani
  • 0

Abstract

Interest in derivative-free optimization (DFO) and "evolutionary strategies"(ES) has recently surged in the Reinforcement Learning (RL) community, withgrowing evidence that they can match state of the art methods for policyoptimization problems in Robotics. However, it is well known that DFO methodssuffer from prohibitively high sampling complexity. They can also be verysensitive to noisy rewards and stochastic dynamics. In this paper, we propose anew class of algorithms, called Robust Blackbox Optimization (RBO). Remarkably,even if up to $23\%$ of all the measurements are arbitrarily corrupted, RBO canprovably recover gradients to high accuracy. RBO relies on learning gradientflows using robust regression methods to enable off-policy updates. On severalMuJoCo robot control tasks, when all other RL approaches collapse in thepresence of adversarial noise, RBO is able to train policies effectively. Wealso show that RBO can be applied to legged locomotion tasks including pathtracking for quadruped robots.

 

Quick Read (beta)

loading the full paper ...