Human-Like Autonomous Car-Following Model with Deep Reinforcement Learning

  • 2019-01-03 01:05:29
  • Meixin Zhu, Xuesong Wang, Yinhai Wang
  • 1

Abstract

This study proposes a framework for human-like autonomous car-followingplanning based on deep reinforcement learning (deep RL). Historical drivingdata are fed into a simulation environment where an RL agent learns from trialand error interactions based on a reward function that signals how much theagent deviates from the empirical data. Through these interactions, an optimalpolicy, or car-following model that maps in a human-like way from speed,relative speed between a lead and following vehicle, and inter-vehicle spacingto acceleration of a following vehicle is finally obtained. The model can becontinuously updated when more data are fed in. Two thousand car-followingperiods extracted from the 2015 Shanghai Naturalistic Driving Study were usedto train the model and compare its performance with that of traditional andrecent data-driven car-following models. As shown by this study results, a deepdeterministic policy gradient car-following model that uses disparity betweensimulated and observed speed as the reward function and considers a reactiondelay of 1s, denoted as DDPGvRT, can reproduce human-like car-followingbehavior with higher accuracy than traditional and recent data-drivencar-following models. Specifically, the DDPGvRT model has a spacing validationerror of 18% and speed validation error of 5%, which are less than those ofother models, including the intelligent driver model, models based on locallyweighted regression, and conventional neural network-based models. Moreover,the DDPGvRT demonstrates good capability of generalization to various drivingsituations and can adapt to different drivers by continuously learning. Thisstudy demonstrates that reinforcement learning methodology can offer insightinto driver behavior and can contribute to the development of human-likeautonomous driving algorithms and traffic-flow models.

 

Introduction (beta)

None

 

Conclusion (beta)

None