Stacked Universal Successor Feature Approximators for Safety in Reinforcement Learning

Abstract

Real-world problems often involve complex objective structures that resistdistillation into reinforcement learning environments with a single objective.Operation costs must be balanced with multi-dimensional task performance andend-states' effects on future availability, all while ensuring safety for otheragents in the environment and the reinforcement learning agent itself. Systemredundancy through secondary backup controllers has proven to be an effectivemethod to ensure safety in real-world applications where the risk of violatingconstraints is extremely high. In this work, we investigate the utility of astacked, continuous-control variation of universal successor featureapproximation (USFA) adapted for soft actor-critic (SAC) and coupled with asuite of secondary safety controllers, which we call stacked USFA for safety(SUSFAS). Our method improves performance on secondary objectives compared toSAC baselines using an intervening secondary controller such as a runtimeassurance (RTA) controller.

Quick Read (beta)

loading the full paper ...