Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

Abstract

Recent literature has proposed approaches that learn control policies withhigh performance while maintaining safety guarantees. SynthesizingHamilton-Jacobi (HJ) reachable sets has become an effective tool for verifyingsafety and supervising the training of reinforcement learning-based controlpolicies for complex, high-dimensional systems. Previously, HJ reachability wasrestricted to verifying low-dimensional dynamical systems primarily because thecomputational complexity of the dynamic programming approach it relied on growsexponentially with the number of system states. In recent years, a litany ofproposed methods addresses this limitation by computing the reachability valuefunction simultaneously with learning control policies to scale HJ reachabilityanalysis while still maintaining a reliable estimate of the true reachable set.These HJ reachability approximations are used to improve the safety, and evenreward performance, of learned control policies and can solve challenging taskssuch as those with dynamic obstacles and/or with lidar-based or vision-basedobservations. In this survey paper, we review the recent developments in thefield of HJ reachability estimation in reinforcement learning that wouldprovide a foundational basis for further research into reliability inhigh-dimensional systems.

Quick Read (beta)

loading the full paper ...