Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents

Abstract

In this paper we analyze the qualitative differences between evolutionarystrategies and reinforcement learning algorithms by focusing on two popularstate-of-the-art algorithms: the OpenAI-ES evolutionary strategy and theProximal Policy Optimization (PPO) reinforcement learning algorithm -- the mostsimilar methods of the two families. We analyze how the methods differ withrespect to: (i) general efficacy, (ii) ability to cope with sparse rewards,(iii) propensity/capacity to discover minimal solutions, (iv) dependency onreward shaping, and (v) ability to cope with variations of the environmentalconditions. The analysis of the performance and of the behavioral strategiesdisplayed by the agents trained with the two methods on benchmark problemsenable us to demonstrate qualitative differences which were not identified inprevious studies, to identify the relative weakness of the two methods, and topropose ways to ameliorate some of those weakness. We show that thecharacteristics of the reward function has a strong impact which varyqualitatively not only for the OpenAI-ES and the PPO but also for alternativereinforcement learning algorithms, thus demonstrating the importance ofoptimizing the characteristic of the reward function to the algorithm used.

Quick Read (beta)

loading the full paper ...