Learning predictive representations in autonomous driving to improve deep reinforcement learning

Abstract

Reinforcement learning using a novel predictive representation is applied toautonomous driving to accomplish the task of driving between lane markingswhere substantial benefits in performance and generalization are observed onunseen test roads in both simulation and on a real Jackal robot. The novelpredictive representation is learned by general value functions (GVFs) toprovide out-of-policy, or counter-factual, predictions of future lanecenteredness and road angle that form a compact representation of the state ofthe agent improving learning in both online and offline reinforcement learningto learn to drive an autonomous vehicle with methods that generalizes well toroads not in the training data. Experiments in both simulation and thereal-world demonstrate that predictive representations in reinforcementlearning improve learning efficiency, smoothness of control and generalizationto roads that the agent was never shown during training, including damaged lanemarkings. It was found that learning a predictive representation that consistsof several predictions over different time scales, or discount factors,improves the performance and smoothness of the control substantially. TheJackal robot was trained in a two step process where the predictiverepresentation is learned first followed by a batch reinforcement learningalgorithm (BCQ) from data collected through both automated and human-guidedexploration in the environment. We conclude that out-of-policy predictiverepresentations with GVFs offer reinforcement learning many benefits inreal-world problems.

Quick Read (beta)

loading the full paper ...