A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Abstract

We consider policy gradient methods for stochastic optimal control problem incontinuous time. In particular, we analyze the gradient flow for the control,viewed as a continuous time limit of the policy gradient method. We prove theglobal convergence of the gradient flow and establish a convergence rate undersome regularity assumptions. The main novelty in the analysis is the notion oflocal optimal control function, which is introduced to characterize the localoptimality of the iterate.

Quick Read (beta)

loading the full paper ...