Multi-parameter Control for the (1+($λ$,$λ$))-GA on OneMax via Deep Reinforcement Learning

Abstract

It is well known that evolutionary algorithms can benefit from dynamicchoices of the key parameters that control their behavior, to adjust theirsearch strategy to the different stages of the optimization process. Aprominent example where dynamic parameter choices have shown a provablesuper-constant speed-up is the $(1+(\lambda,\lambda))$ Genetic Algorithmoptimizing the OneMax function. While optimal parameter control policies resultin linear expected running times, this is not possible with static parameterchoices. This result has spurred a lot of interest in parameter controlpolicies. However, many works, in particular theoretical running time analyses,focus on controlling one single parameter. Deriving policies for controllingmultiple parameters remains very challenging. In this work we reconsider theproblem of the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing OneMax. Wedecouple its four main parameters and investigate how well state-of-the-artdeep reinforcement learning techniques can approximate good control policies.We show that although making deep reinforcement learning learn effectively is achallenging task, once it works, it is very powerful and is able to findpolicies that outperform all previously known control policies on the samebenchmark. Based on the results found through reinforcement learning, we derivea simple control policy that consistently outperforms the defaulttheory-recommended setting by $27\%$ and the irace-tuned policy, the strongestexisting control policy on this benchmark, by $13\%$, for all tested problemsizes up to $40{,}000$.

Quick Read (beta)

loading the full paper ...