Floyd-Warshall Reinforcement Learning: Learning from Past Experiences to Reach New Goals

Abstract

Consider mutli-goal tasks that involve static environments and dynamic goals.Examples of such tasks, such as goal-directed navigation and pick-and-place inrobotics, abound. Two types of Reinforcement Learning (RL) algorithms are usedfor such tasks: model-free or model-based. Each of these approaches haslimitations. Model-free RL struggles to transfer learned information when thegoal location changes, but achieves high asymptotic accuracy in single goaltasks. Model-based RL can transfer learned information to new goal locations byretaining the explicitly learned state-dynamics, but is limited by the factthat small errors in modelling these dynamics accumulate over long-termplanning. In this work, we improve upon the limitations of model-free RL inmulti-goal domains. We do this by adapting the Floyd-Warshall algorithm for RLand call the adaptation Floyd-Warshall RL (FWRL). The proposed algorithm learnsa goal-conditioned action-value function by constraining the value of theoptimal path between any two states to be greater than or equal to the value ofpaths via intermediary states. Experimentally, we show that FWRL is moresample-efficient and learns higher reward strategies in multi-goal tasks ascompared to Q-learning, model-based RL and other relevant baselines in atabular domain.

Quick Read (beta)

loading the full paper ...