Identify, Estimate and Bound the Uncertainty of Reinforcement Learning for Autonomous Driving

Abstract

Deep reinforcement learning (DRL) has emerged as a promising approach fordeveloping more intelligent autonomous vehicles (AVs). A typical DRLapplication on AVs is to train a neural network-based driving policy. However,the black-box nature of neural networks can result in unpredictable decisionfailures, making such AVs unreliable. To this end, this work proposes a methodto identify and protect unreliable decisions of a DRL driving policy. The basicidea is to estimate and constrain the policy's performance uncertainty, whichquantifies potential performance drop due to insufficient training data ornetwork fitting errors. By constraining the uncertainty, the DRL model'sperformance is always greater than that of a baseline policy. The uncertaintycaused by insufficient data is estimated by the bootstrapped method. Then, theuncertainty caused by the network fitting error is estimated using an ensemblenetwork. Finally, a baseline policy is added as the performance lower bound toavoid potential decision failures. The overall framework is calleduncertainty-bound reinforcement learning (UBRL). The proposed UBRL is evaluatedon DRL policies with different amounts of training data, taking an unprotectedleft-turn driving case as an example. The result shows that the UBRL method canidentify potentially unreliable decisions of DRL policy. The UBRL guarantees tooutperform baseline policy even when the DRL policy is not well-trained and hashigh uncertainty. Meanwhile, the performance of UBRL improves with moretraining data. Such a method is valuable for the DRL application on real-roaddriving and provides a metric to evaluate a DRL policy.

Quick Read (beta)

loading the full paper ...