Maximum Entropy Model-based Reinforcement Learning

Abstract

Recent advances in reinforcement learning have demonstrated its ability tosolve hard agent-environment interaction tasks on a super-human level. However,the application of reinforcement learning methods to practical and real-worldtasks is currently limited due to most RL state-of-art algorithms' sampleinefficiency, i.e., the need for a vast number of training episodes. Forexample, OpenAI Five algorithm that has beaten human players in Dota 2 hastrained for thousands of years of game time. Several approaches exist thattackle the issue of sample inefficiency, that either offers a more efficientusage of already gathered experience or aim to gain a more relevant and diverseexperience via a better exploration of an environment. However, to ourknowledge, no such approach exists for model-based algorithms, that showedtheir high sample efficiency in solving hard control tasks withhigh-dimensional state space. This work connects exploration techniques andmodel-based reinforcement learning. We have designed a novel exploration methodthat takes into account features of the model-based approach. We alsodemonstrate through experiments that our method significantly improves theperformance of the model-based algorithm Dreamer.

Quick Read (beta)

loading the full paper ...