### Abstract

In this paper, we present a hierarchical path planning framework called SG-RL(subgoal graphs-reinforcement learning), to plan rational paths for agentsmaneuvering in continuous and uncertain environments. By "rational", we mean(1) efficient path planning to eliminate first-move lags; (2) collision-freeand smooth for agents with kinematic constraints satisfied. SG-RL works in atwo-level manner. At the first level, SG-RL uses a geometric path-planningmethod, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstractpaths, also called subgoal sequences. At the second level, SG-RL uses an RLmethod, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimalmotion-planning policies which can generate kinematically feasible andcollision-free trajectories between adjacent subgoals. The first advantage ofthe proposed method is that SSG can solve the limitations of sparse reward andlocal minima trap for RL agents; thus, LSPI can be used to generate paths incomplex environments. The second advantage is that, when the environmentchanges slightly (i.e., unexpected obstacles appearing), SG-RL does not need toreconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPIcan deal with uncertainties by exploiting its generalization ability to handlechanges in environments. Simulation experiments in representative scenariosdemonstrate that, compared with existing methods, SG-RL can work well onlarge-scale maps with relatively low action-switching frequencies and shorterpath lengths, and SG-RL can deal with small changes in environments. We furtherdemonstrate that the design of reward functions and the types of trainingenvironments are important factors for learning feasible policies.

### Introduction (beta)

None

### Conclusion (beta)

None