Autonomous vehicles need to handle various traffic conditions and make safeand efficient decisions and maneuvers. However, on the one hand, a singleoptimization/sampling-based motion planner cannot efficiently generate safetrajectories in real time, particularly when there are many interactivevehicles near by. On the other hand, end-to-end learning methods cannot assurethe safety of the outcomes. To address this challenge, we propose ahierarchical behavior planning framework with a set of low-level safecontrollers and a high-level reinforcement learning algorithm (H-CtRL) as acoordinator for the low-level controllers. Safety is guaranteed by thelow-level optimization/sampling-based controllers, while the high-levelreinforcement learning algorithm makes H-CtRL an adaptive and efficientbehavior planner. To train and test our proposed algorithm, we built asimulator that can reproduce traffic scenes using real-world datasets. Theproposed H-CtRL is proved to be effective in various realistic simulationscenarios, with satisfying performance in terms of both safety and efficiency.