Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Abstract

Despite recent advances in reinforcement learning (RL), its application insafety critical domains like autonomous vehicles is still challenging. Althoughpunishing RL agents for risky situations can help to learn safe policies, itmay also lead to highly conservative behavior. In this paper, we propose adistributional RL framework in order to learn adaptive policies that can tunetheir level of conservativity at run-time based on the desired comfort andutility. Using a proactive safety verification approach, the proposed frameworkcan guarantee that actions generated from RL are fail-safe according to theworst-case assumptions. Concurrently, the policy is encouraged to minimizesafety interference and generate more comfortable behavior. We trained andevaluated the proposed approach and baseline policies using a high levelsimulator with a variety of randomized scenarios including several corner caseswhich rarely happen in reality but are very crucial. In light of ourexperiments, the behavior of policies learned using distributional RL can beadaptive at run-time and robust to the environment uncertainty. Quantitatively,the learned distributional RL agent drives in average 8 seconds faster than thenormal DQN policy and requires 83\% less safety interference compared to therule-based policy with slightly increasing the average crossing time. We alsostudy sensitivity of the learned policy in environments with higher perceptionnoise and show that our algorithm learns policies that can still drive reliablewhen the perception noise is two times higher than the training configurationfor automated merging and crossing at occluded intersections.

Quick Read (beta)

loading the full paper ...