CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning

Abstract

Reinforcement learning agents can achieve superhuman performance in statictasks but are costly to train and fragile to task changes. This limits theirdeployment in real-world scenarios where training experience is expensive orthe context changes through factors like sensor degradation, environmentalprocesses or changing mission priorities. Lifelong reinforcement learning aimsto improve sample efficiency and adaptability by studying how agents perform inevolving problems. The difficulty that these changes pose to an agent is rarelymeasured directly, however. Agent performances can be compared across a change,but this is often prohibitively expensive. We propose Change-Induced RegretProxy (CHIRP) metrics, a class of metrics for approximating a change'sdifficulty while avoiding the high costs of using trained agents. Arelationship between a CHIRP metric and agent performance is identified in twoenvironments, a simple grid world and MetaWorld's suite of robotic arm tasks.We demonstrate two uses for these metrics: for learning, an agent that clustersMDPs based on a CHIRP metric achieves $17\%$ higher average returns than threeexisting agents in a sequence of MetaWorld tasks. We also show how a CHIRP canbe calibrated to compare the difficulty of changes across distinctly differentenvironments.

Quick Read (beta)

loading the full paper ...