Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

Abstract

In this work, we provide theoretical guarantees for reward decomposition indeterministic MDPs. Reward decomposition is a special case of HierarchicalReinforcement Learning, that allows one to learn many policies in parallel andcombine them into a composite solution. Our approach builds on mapping thisproblem into a Reward Discounted Traveling Salesman Problem, and then derivingapproximate solutions for it. In particular, we focus on approximate solutionsthat are local, i.e., solutions that only observe information about the currentstate. Local policies are easy to implement and do not require substantialcomputational resources as they do not perform planning. While localdeterministic policies, like Nearest Neighbor, are being used in practice forhierarchical reinforcement learning, we propose three stochastic policies thatguarantee better performance than any deterministic policy.

Quick Read (beta)

loading the full paper ...