Can Learned Optimization Make Reinforcement Learning Less Difficult?

Abstract

While reinforcement learning (RL) holds great potential for decision makingin the real world, it suffers from a number of unique difficulties which oftenneed specific consideration. In particular: it is highly non-stationary;suffers from high degrees of plasticity loss; and requires exploration toprevent premature convergence to local optima and maximize return. In thispaper, we consider whether learned optimization can help overcome theseproblems. Our method, Learned Optimization for Plasticity, Exploration andNon-stationarity (OPEN), meta-learns an update rule whose input features andoutput structure are informed by previously proposed solutions to thesedifficulties. We show that our parameterization is flexible enough to enablemeta-learning in diverse learning contexts, including the ability to usestochasticity for exploration. Our experiments demonstrate that whenmeta-trained on single and small sets of environments, OPEN outperforms orequals traditionally used optimizers. Furthermore, OPEN shows stronggeneralization across a distribution of environments and a range of agentarchitectures.

Quick Read (beta)

loading the full paper ...