Model-based adaptation for sample efficient transfer in reinforcement learning control of parameter-varying systems

Abstract

In this paper, we leverage ideas from model-based control to address thesample efficiency problem of reinforcement learning (RL) algorithms.Accelerating learning is an active field of RL highly relevant in the contextof time-varying systems. Traditional transfer learning methods propose to useprior knowledge of the system behavior to devise a gradual or immediatedata-driven transformation of the control policy obtained through RL. Suchtransformation is usually computed by estimating the performance of previouscontrol policies based on measurements recently collected from the system.However, such retrospective measures have debatable utility with no guaranteesof positive transfer in most cases. Instead, we propose a model-basedtransformation, such that when actions from a control policy are applied to thetarget system, a positive transfer is achieved. The transformation can be usedas an initialization for the reinforcement learning process to converge to anew optimum. We validate the performance of our approach through four benchmarkexamples. We demonstrate that our approach is more sample-efficient thanfine-tuning with reinforcement learning alone and achieves comparableperformance to linear-quadratic-regulators and model-predictive control when anaccurate linear model is known in the three cases. If an accurate model is notknown, we empirically show that the proposed approach still guarantees positivetransfer with jump-start improvement.

Quick Read (beta)

loading the full paper ...