Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance

Abstract

Reinforcement learning (RL) problems often feature deceptive local optima,and learning methods that optimize purely for reward signal often fail to learnstrategies for overcoming them. Deep neuroevolution and novelty search havebeen proposed as effective alternatives to gradient-based methods for learningRL policies directly from pixels. In this paper, we introduce and evaluate theuse of novelty search over agent action sequences by string edit metricdistance as a means for promoting innovation. We also introduce a method forstagnation detection and population resampling inspired by recent developmentsin the RL community that uses the same mechanisms as novelty search to promoteand develop innovative policies. Our methods extend a state-of-the-art methodfor deep neuroevolution using a simple-yet-effective genetic algorithm (GA)designed to efficiently learn deep RL policy network weights. Experiments usingfour games from the Atari 2600 benchmark were conducted. Results providefurther evidence that GAs are competitive with gradient-based algorithms fordeep RL. Results also demonstrate that novelty search over action sequences isan effective source of selection pressure that can be integrated into existingevolutionary algorithms for deep RL.

Quick Read (beta)

loading the full paper ...