Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Abstract

Exploration in sparse-reward reinforcement learning is difficult due to therequirement of long, coordinated sequences of actions in order to achieve anyreward. Moreover, in continuous action spaces there are an infinite number ofpossible actions, which only increases the difficulty of exploration. One classof methods designed to address these issues forms temporally extended actions,often called skills, from interaction data collected in the same domain, andoptimizes a policy on top of this new action space. Typically such methodsrequire a lengthy pretraining phase, especially in continuous action spaces, inorder to form the skills before reinforcement learning can begin. Given priorevidence that the full range of the continuous action space is not required insuch tasks, we propose a novel approach to skill-generation with twocomponents. First we discretize the action space through clustering, and secondwe leverage a tokenization technique borrowed from natural language processingto generate temporally extended actions. Such a method outperforms baselinesfor skill-generation in several challenging sparse-reward domains, and requiresorders-of-magnitude less computation in skill-generation and online rollouts.

Quick Read (beta)

loading the full paper ...