SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Abstract

Model-free deep reinforcement learning (RL) has been successful in a range ofchallenging domains. However, there are some remaining issues, such asstabilizing the optimization of nonlinear function approximators, preventingerror propagation due to the Bellman backup in Q-learning, and efficientexploration. To mitigate these issues, we present SUNRISE, a simple unifiedensemble method, which is compatible with various off-policy RL algorithms.SUNRISE integrates three key ingredients: (a) bootstrap with randominitialization which improves the stability of the learning process by traininga diverse ensemble of agents, (b) weighted Bellman backups, which prevent errorpropagation in Q-learning by reweighing sample transitions based on uncertaintyestimates from the ensembles, and (c) an inference method that selects actionsusing highest upper-confidence bounds for efficient exploration. Ourexperiments show that SUNRISE significantly improves the performance ofexisting off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN,for both continuous and discrete control tasks on both low-dimensional andhigh-dimensional environments. Our training code is available athttps://github.com/pokaxpoka/sunrise.

Quick Read (beta)

loading the full paper ...