Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search

  • 2022-05-06 23:57:45
  • Karl Kurzer, Matthias Bitzer, J. Marius Zöllner
  • 0

Abstract

Cooperative trajectory planning methods for automated vehicles can solvetraffic scenarios that require a high degree of cooperation between trafficparticipants. However, for cooperative systems to integrate into human-centeredtraffic, the automated systems must behave human-like so that humans cananticipate the system's decisions. While Reinforcement Learning has maderemarkable progress in solving the decision-making part, it is non-trivial toparameterize a reward model that yields predictable actions. This work employsfeature-based Maximum Entropy Inverse Reinforcement Learning combined withMonte Carlo Tree Search to learn reward models that maximize the likelihood ofrecorded multi-agent cooperative expert trajectories. The evaluationdemonstrates that the approach can recover a reasonable reward model thatmimics the expert and performs similarly to a manually tuned baseline rewardmodel.

 

Quick Read (beta)

loading the full paper ...