Abstract
In this paper, we address a method that integrates reinforcement learninginto the Monte Carlo tree search to boost online path planning under fullyobservable environments for automated parking tasks. Sampling-based planningmethods under high-dimensional space can be computationally expensive andtime-consuming. State evaluation methods are useful by leveraging the priorknowledge into the search steps, making the process faster in a real-timesystem. Given the fact that automated parking tasks are often executed undercomplex environments, a solid but lightweight heuristic guidance is challengingto compose in a traditional analytical way. To overcome this limitation, wepropose a reinforcement learning pipeline with a Monte Carlo tree search underthe path planning framework. By iteratively learning the value of a state andthe best action among samples from its previous cycle's outcomes, we are ableto model a value estimator and a policy generator for given states. By doingthat, we build up a balancing mechanism between exploration and exploitation,speeding up the path planning process while maintaining its quality withoutusing human expert driver data.