Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving

Abstract

Reinforcement learning (RL) has been widely used in decision-making tasks,but it cannot guarantee the agent's safety in the training process due to therequirements of interaction with the environment, which seriously limits itsindustrial applications such as autonomous driving. Safe RL methods aredeveloped to handle this issue by constraining the expected safety violationcosts as a training objective, but they still permit unsafe state occurrence,which is unacceptable in autonomous driving tasks. Moreover, these methods aredifficult to achieve a balance between the cost and return expectations, whichleads to learning performance degradation for the algorithms. In this paper, wepropose a novel algorithm based on the long and short-term constraints (LSTC)for safe RL. The short-term constraint aims to guarantee the short-term statesafety that the vehicle explores, while the long-term constraint ensures theoverall safety of the vehicle throughout the decision-making process. Inaddition, we develop a safe RL method with dual-constraint optimization basedon the Lagrange multiplier to optimize the training process for end-to-endautonomous driving. Comprehensive experiments were conducted on the MetaDrivesimulator. Experimental results demonstrate that the proposed method achieveshigher safety in continuous state and action tasks, and exhibits higherexploration performance in long-distance decision-making tasks compared withstate-of-the-art methods.

Quick Read (beta)

loading the full paper ...