Abstract
Deep reinforcement Learning for end-to-end driving is limited by the need ofcomplex reward engineering. Sparse rewards can circumvent this challenge butsuffers from long training time and leads to sub-optimal policy. In this work,we explore full-control driving with only goal-constrained sparse reward andpropose a curriculum learning approach for end-to-end driving using onlynavigation view maps that benefit from small virtual-to-real domain gap. Toaddress the complexity of multiple driving policies, we learn concurrentindividual policies selected at inference by a navigation system. Wedemonstrate the ability of our proposal to generalize on unseen road layout,and to drive significantly longer than in the training.
Quick Read (beta)
loading the full paper ...