Convergent Policy Optimization for Safe Reinforcement Learning

  • 2019-10-26 23:40:46
  • Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang
  • 0

Abstract

We study the safe reinforcement learning problem with nonlinear functionapproximation, where policy optimization is formulated as a constrainedoptimization problem with both the objective and the constraint being nonconvexfunctions. For such a problem, we construct a sequence of surrogate convexconstrained optimization problems by replacing the nonconvex functions locallywith convex quadratic functions obtained from policy gradient estimators. Weprove that the solutions to these surrogate problems converge to a stationarypoint of the original nonconvex problem. Furthermore, to extend our theoreticalresults, we apply our algorithm to examples of optimal control and multi-agentreinforcement learning with safety constraints.

 

Quick Read (beta)

loading the full paper ...