Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Abstract

Model-based reinforcement learning (RL) is more sample efficient thanmodel-free RL by using imaginary trajectories generated by the learned dynamicsmodel. When the model is inaccurate or biased, imaginary trajectories may bedeleterious for training the action-value and policy functions. To alleviatesuch problem, this paper proposes to adaptively reweight the imaginarytransitions, so as to reduce the negative effects of poorly generatedtrajectories. More specifically, we evaluate the effect of an imaginarytransition by calculating the change of the loss computed on the real sampleswhen we use the transition to train the action-value and policy functions.Based on this evaluation criterion, we construct the idea of reweighting eachimaginary transition by a well-designed meta-gradient algorithm. Extensiveexperimental results demonstrate that our method outperforms state-of-the-artmodel-based and model-free RL algorithms on multiple tasks. Visualization ofour changing weights further validates the necessity of utilizing reweightscheme.

Quick Read (beta)

loading the full paper ...