A neural network based policy iteration algorithm with global $H^2$-superlinear convergence for stochastic games on domains

Abstract

In this work, we propose a class of numerical schemes for solving semilinearHamilton-Jacobi-Bellman-Isaacs (HJBI) boundary value problems which arisenaturally from exit time problems of diffusion processes with controlled drift.We exploit policy iteration to reduce the semilinear problem into a sequence oflinear Dirichlet problems, which are subsequently approximated by a multilayerfeedforward neural network ansatz. We establish that the numerical solutionsconverge globally in the $H^2$-norm, and further demonstrate that thisconvergence is superlinear, by interpreting the algorithm as an inexact Newtoniteration for the HJBI equation. Moreover, we construct the optimal feedbackcontrols from the numerical value functions and deduce convergence. Thenumerical schemes and convergence results are then extended to HJBI boundaryvalue problems corresponding to controlled diffusion processes with obliqueboundary reflection. Numerical experiments on the stochastic Zermelo navigationproblem are presented to illustrate the theoretical results and to demonstratethe effectiveness of the method.

Quick Read (beta)

loading the full paper ...