Continual Model-Based Reinforcement Learning with Hypernetworks

Abstract

Effective planning in model-based reinforcement learning (MBRL) andmodel-predictive control (MPC) relies on the accuracy of the learned dynamicsmodel. In many instances of MBRL and MPC, this model is assumed to bestationary and is periodically re-trained from scratch on state transitionexperience collected from the beginning of environment interactions. Thisimplies that the time required to train the dynamics model - and the pauserequired between plan executions - grows linearly with the size of thecollected experience. We argue that this is too slow for lifelong robotlearning and propose HyperCRL, a method that continually learns the encountereddynamics in a sequence of tasks using task-conditional hypernetworks. Ourmethod has three main attributes: first, it includes dynamics learning sessionsthat do not revisit training data from previous tasks, so it only needs tostore the most recent fixed-size portion of the state transition experience;second, it uses fixed-capacity hypernetworks to represent non-stationary andtask-aware dynamics; third, it outperforms existing continual learningalternatives that rely on fixed-capacity networks, and does competitively withbaselines that remember an ever increasing coreset of past experience. We showthat HyperCRL is effective in continual model-based reinforcement learning inrobot locomotion and manipulation scenarios, such as tasks involving pushingand door opening. Our project website with videos is at this linkhttps://rvl.cs.toronto.edu/blog/2020/hypercrl

Quick Read (beta)

loading the full paper ...