Autonomous urban driving navigation with complex multi-agent dynamics isunder-explored due to the difficulty of learning an optimal driving policy. Thetraditional modular pipeline heavily relies on hand-designed rules and thepre-processing perception system while the supervised learning-based models arelimited by the accessibility of extensive human experience. We present ageneral and principled Controllable Imitative Reinforcement Learning (CIRL)approach which successfully makes the driving agent achieve higher successrates based on only vision inputs in a high-fidelity car simulator. Toalleviate the low exploration efficiency for large continuous action space thatoften prohibits the use of classical RL on challenging real tasks, our CIRLexplores over a reasonably constrained action space guided by encodedexperiences that imitate human demonstrations, building upon Deep DeterministicPolicy Gradient (DDPG). Moreover, we propose to specialize adaptive policiesand steering-angle reward designs for different control signals (i.e. follow,straight, turn right, turn left) based on the shared representations to improvethe model capability in tackling with diverse cases. Extensive experiments onCARLA driving benchmark demonstrate that CIRL substantially outperforms allprevious methods in terms of the percentage of successfully completed episodeson a variety of goal-directed driving tasks. We also show its superiorgeneralization capability in unseen environments. To our knowledge, this is thefirst successful case of the learned driving policy through reinforcementlearning in the high-fidelity simulator, which performs better-than supervisedimitation learning.