Self-driving vehicles must be able to act intelligently in diverse anddifficult environments, marked by high-dimensional state spaces, a myriad ofoptimization objectives and complex behaviors. Traditionally, classicaloptimization and search techniques have been applied to the problem ofself-driving; but they do not fully address operations in environments withhigh-dimensional states and complex behaviors. Recently, imitation learning hasbeen proposed for the task of self-driving; but it is labor-intensive to obtainenough training data. Reinforcement learning has been proposed as a way todirectly control the car, but this has safety and comfort concerns. We proposeusing model-free reinforcement learning for the trajectory planning stage ofself-driving and show that this approach allows us to operate the car in a moresafe, general and comfortable manner, required for the task of self driving.