Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

  • 2022-08-07 09:00:24
  • Ambedkar Dukkipati, Rajarshi Banerjee, Ranga Shaarad Ayyagari, Dhaval Parmar Udaybhai
  • 0

Abstract

Solving complex problems using reinforcement learning necessitates breakingdown the problem into manageable tasks and learning policies to solve thesetasks. These policies, in turn, have to be controlled by a master policy thattakes high-level decisions. Hence learning policies involves hierarchicaldecision structures. However, training such methods in practice may lead topoor generalization, with either sub-policies executing actions for too fewtime steps or devolving into a single policy altogether. In our work, weintroduce an alternative approach to learn such skills sequentially withoutusing an overarching hierarchical policy. We propose this method in the contextof environments where a major component of the objective of a learning agent isto prolong the episode for as long as possible. We refer to our proposed methodas Sequential Soft Option Critic. We demonstrate the utility of our approach onnavigation and goal-based tasks in a flexible simulated 3D navigationenvironment that we have developed. We also show that our method outperformsprior methods such as Soft Actor-Critic and Soft Option Critic on variousenvironments, including the Atari River Raid environment and the Gym-Duckietownself-driving car simulator.

 

Quick Read (beta)

loading the full paper ...