Abstract
Modern approaches to autonomous driving rely heavily on learned componentstrained with large amounts of human driving data via imitation learning.However, these methods require large amounts of expensive data collection andeven then face challenges with safely handling long-tail scenarios andcompounding errors over time. At the same time, pure Reinforcement Learning(RL) methods can fail to learn performant policies in sparse, constrained, andchallenging-to-define reward settings like driving. Both of these challengesmake deploying purely cloned policies in safety critical applications likeautonomous vehicles challenging. In this paper we propose Combining IMitationand Reinforcement Learning (CIMRL) approach - a framework that enables trainingdriving policies in simulation through leveraging imitative motion priors andsafety constraints. CIMRL does not require extensive reward specification andimproves on the closed loop behavior of pure cloning methods. By combining RLand imitation, we demonstrate that our method achieves state-of-the-art resultsin closed loop simulation driving benchmarks.