Training Agents using Upside-Down Reinforcement Learning

Abstract

We develop Upside-Down Reinforcement Learning (UDRL), a method for learningto act using only supervised learning techniques. Unlike traditionalalgorithms, UDRL does not use reward prediction or search for an optimalpolicy. Instead, it trains agents to follow commands such as "obtain so muchtotal reward in so much time." Many of its general principles are outlined in acompanion report; the goal of this paper is to develop a practical learningalgorithm and show that this conceptually simple perspective on agent trainingcan produce a range of rewarding behaviors for multiple episodic environments.Experiments show that on some tasks UDRL's performance can be surprisinglycompetitive with, and even exceed that of some traditional baseline algorithmsdeveloped over decades of research. Based on these results, we suggest thatalternative approaches to expected reward maximization have an important roleto play in training useful autonomous agents.

Quick Read (beta)

loading the full paper ...