Abstract Value Iteration for Hierarchical Reinforcement Learning

Abstract

We propose a novel hierarchical reinforcement learning framework for controlwith continuous state and action spaces. In our framework, the user specifiessubgoal regions which are subsets of states; then, we (i) learn options thatserve as transitions between these subgoal regions, and (ii) construct ahigh-level plan in the resulting abstract decision process (ADP). A keychallenge is that the ADP may not be Markov, which we address by proposing twoalgorithms for planning in the ADP. Our first algorithm is conservative,allowing us to prove theoretical guarantees on its performance, which helpinform the design of subgoal regions. Our second algorithm is a practical onethat interweaves planning at the abstract level and learning at the concretelevel. In our experiments, we demonstrate that our approach outperformsstate-of-the-art hierarchical reinforcement learning algorithms on severalchallenging benchmarks.

Quick Read (beta)

loading the full paper ...