Multiple Modes for Continual Learning

Abstract

Adapting model parameters to incoming streams of data is a crucial factor todeep learning scalability. Interestingly, prior continual learning strategiesin online settings inadvertently anchor their updated parameters to a localparameter subspace to remember old tasks, else drift away from the subspace andforget. From this observation, we formulate a trade-off between constructingmultiple parameter modes and allocating tasks per mode. Mode-Optimized TaskAllocation (MOTA), our contributed adaptation strategy, trains multiple modesin parallel, then optimizes task allocation per mode. We empiricallydemonstrate improvements over baseline continual learning strategies and acrossvarying distribution shifts, namely sub-population, domain, and task shift.

Quick Read (beta)

loading the full paper ...