Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks

Abstract

Lifelong learning agents aim to learn multiple tasks sequentially over alifetime. This involves the ability to exploit previous knowledge when learningnew tasks and to avoid forgetting. Modulating masks, a specific type ofparameter isolation approach, have recently shown promise in both supervisedand reinforcement learning. While lifelong learning algorithms have beeninvestigated mainly within a single-agent approach, a question remains on howmultiple agents can share lifelong learning knowledge with each other. We showthat the parameter isolation mechanism used by modulating masks is particularlysuitable for exchanging knowledge among agents in a distributed anddecentralized system of lifelong learners. The key idea is that the isolationof specific task knowledge to specific masks allows agents to transfer onlyspecific knowledge on-demand, resulting in robust and effective distributedlifelong learning. We assume fully distributed and asynchronous scenarios withdynamic agent numbers and connectivity. An on-demand communication protocolensures agents query their peers for specific masks to be transferred andintegrated into their policies when facing each task. Experiments indicate thaton-demand mask communication is an effective way to implement distributedlifelong reinforcement learning and provides a lifelong learning benefit withrespect to distributed RL baselines such as DD-PPO, IMPALA, and PPO+EWC. Thesystem is particularly robust to connection drops and demonstrates rapidlearning due to knowledge exchange.

Quick Read (beta)

loading the full paper ...