CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning

Abstract

Reinforcement learning (RL) agents are costly to train and fragile toenvironmental changes. They often perform poorly when there are many changingtasks, prohibiting their widespread deployment in the real world. Many LifelongRL agent designs have been proposed to mitigate issues such as catastrophicforgetting or demonstrate positive characteristics like forward transfer whenchange occurs. However, no prior work has established whether the impact onagent performance can be predicted from the change itself. Understanding thisrelationship will help agents proactively mitigate a change's impact forimproved learning performance. We propose Change-Induced Regret Proxy (CHIRP)metrics to link change to agent performance drops and use two environments todemonstrate a CHIRP's utility in lifelong learning. A simple CHIRP-based agentachieved $48\%$ higher performance than the next best method in one benchmarkand attained the best success rates in 8 of 10 tasks in a second benchmarkwhich proved difficult for existing lifelong RL agents.

Quick Read (beta)

loading the full paper ...