Position: Lifetime tuning is incompatible with continual reinforcement learning

Abstract

In continual RL we want agents capable of never-ending learning, and yet ourevaluation methodologies do not reflect this. The standard practice in RL is toassume unfettered access to the deployment environment for the full lifetime ofthe agent. For example, agent designers select the best performinghyperparameters in Atari by testing each for 200 million frames and thenreporting results on 200 million frames. In this position paper, we argue anddemonstrate the pitfalls of this inappropriate empirical methodology: lifetimetuning. We provide empirical evidence to support our position by testing DQNand SAC across several of continuing and non-stationary environments with twomain findings: (1) lifetime tuning does not allow us to identify algorithmsthat work well for continual learning -- all algorithms equally succeed; (2)recently developed continual RL algorithms outperform standard non-continualalgorithms when tuning is limited to a fraction of the agent's lifetime. Thegoal of this paper is to provide an explanation for why recent progress incontinual RL has been mixed and motivate the development of empirical practicesthat better match the goals of continual RL.

Quick Read (beta)

loading the full paper ...