Derivative-Free Reinforcement Learning: A Review

Abstract

Reinforcement learning is about learning agent models that make the bestsequential decisions in unknown environments. In an unknown environment, theagent needs to explore the environment while exploiting the collectedinformation, which usually forms a sophisticated problem to solve.Derivative-free optimization, meanwhile, is capable of solving sophisticatedproblems. It commonly uses a sampling-and-updating framework to iterativelyimprove the solution, where exploration and exploitation are also needed to bewell balanced. Therefore, derivative-free optimization deals with a similarcore issue as reinforcement learning, and has been introduced in reinforcementlearning approaches, under the names of learning classifier systems andneuroevolution/evolutionary reinforcement learning. Although such methods havebeen developed for decades, recently, derivative-free reinforcement learningexhibits attracting increasing attention. However, recent survey on this topicis still lacking. In this article, we summarize methods of derivative-freereinforcement learning to date, and organize the methods in aspects includingparameter updating, model selection, exploration, and parallel/distributedmethods. Moreover, we discuss some current limitations and possible futuredirections, hoping that this article could bring more attentions to this topicand serve as a catalyst for developing novel and efficient approaches.

Quick Read (beta)

loading the full paper ...