Assessing Policy, Loss and Planning Combinations in Reinforcement Learning using a New Modular Architecture

Abstract

The model-based reinforcement learning paradigm, which uses planningalgorithms and neural network models, has recently achieved unprecedentedresults in diverse applications, leading to what is now known as deepreinforcement learning. These agents are quite complex and involve multiplecomponents, factors that can create challenges for research. In this work, wepropose a new modular software architecture suited for these types of agents,and a set of building blocks that can be easily reused and assembled toconstruct new model-based reinforcement learning agents. These building blocksinclude planning algorithms, policies, and loss functions. We illustrate the use of this architecture by combining several of thesebuilding blocks to implement and test agents that are optimized to threedifferent test environments: Cartpole, Minigrid, and Tictactoe. One particularplanning algorithm, made available in our implementation and not previouslyused in reinforcement learning, which we called averaged minimax, achieved goodresults in the three tested environments. Experiments performed with this architecture have shown that the bestcombination of planning algorithm, policy, and loss function is heavily problemdependent. This result provides evidence that the proposed architecture, whichis modular and reusable, is useful for reinforcement learning researchers whowant to study new environments and techniques.

Quick Read (beta)

loading the full paper ...