Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

  • 2023-09-08 10:33:41
  • Felix Chalumeau, Raphael Boige, Bryan Lim, Valentin MacĂ©, Maxime Allard, Arthur Flajolet, Antoine Cully, Thomas Pierrot
  • 0

Abstract

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm fortraining neural policies to solve complex control tasks. However, thesepolicies tend to be overfit to the exact specifications of the task andenvironment they were trained on, and thus do not perform well when conditionsdeviate slightly or when composed hierarchically to solve even more complextasks. Recent work has shown that training a mixture of policies, as opposed toa single one, that are driven to explore different regions of the state-actionspace can address this shortcoming by generating a diverse set of behaviors,referred to as skills, that can be collectively used to great effect inadaptation tasks or for hierarchical planning. This is typically realized byincluding a diversity term - often derived from information theory - in theobjective function optimized by RL. However these approaches often requirecareful hyperparameter tuning to be effective. In this work, we demonstratethat less widely-used neuroevolution methods, specifically Quality Diversity(QD), are a competitive alternative to information-theory-augmented RL forskill discovery. Through an extensive empirical evaluation comparing eightstate-of-the-art algorithms (four flagship algorithms from each line of work)on the basis of (i) metrics directly evaluating the skills' diversity, (ii) theskills' performance on adaptation tasks, and (iii) the skills' performance whenused as primitives for hierarchical planning; QD methods are found to provideequal, and sometimes improved, performance whilst being less sensitive tohyperparameters and more scalable. As no single method is found to providenear-optimal performance across all environments, there is a rich scope forfurther research which we support by proposing future directions and providingoptimized open-source implementations.

 

Quick Read (beta)

loading the full paper ...