Policy Augmentation: An Exploration Strategy for Faster Convergence of Deep Reinforcement Learning Algorithms

Abstract

Despite advancements in deep reinforcement learning algorithms, developing aneffective exploration strategy is still an open problem. Most existingexploration strategies either are based on simple heuristics, or require themodel of the environment, or train additional deep neural networks to generateimagination-augmented paths. In this paper, a revolutionary algorithm, calledPolicy Augmentation, is introduced. Policy Augmentation is based on a newlydeveloped inductive matrix completion method. The proposed algorithm augmentsthe values of unexplored state-action pairs, helping the agent take actionsthat will result in high-value returns while the agent is in the earlyepisodes. Training deep reinforcement learning algorithms with high-valuerollouts leads to the faster convergence of deep reinforcement learningalgorithms. Our experiments show the superior performance of PolicyAugmentation. The code can be found at:https://github.com/arashmahyari/PolicyAugmentation.

Quick Read (beta)

loading the full paper ...