Abstract
Reinforcement Learning's high sensitivity to hyperparameters is a source ofinstability and inefficiency, creating significant challenges forpractitioners. Hyperparameter Optimization (HPO) algorithms have been developedto address this issue, among them Population-Based Training (PBT) stands outfor its ability to generate hyperparameters schedules instead of fixedconfigurations. PBT trains a population of agents, each with its ownhyperparameters, frequently ranking them and replacing the worst performerswith mutations of the best agents. These intermediate selection steps can causePBT to focus on short-term improvements, leading it to get stuck in localoptima and eventually fall behind vanilla Random Search over longer timescales.This paper studies how this greediness issue is connected to the choice ofevolution frequency, the rate at which the selection is done. We proposeMultiple-Frequencies Population-Based Training (MF-PBT), a novel HPO algorithmthat addresses greediness by employing sub-populations, each evolving atdistinct frequencies. MF-PBT introduces a migration process to transferinformation between sub-populations, with an asymmetric design to balance shortand long-term optimization. Extensive experiments on the Brax suite demonstratethat MF-PBT improves sample efficiency and long-term performance, even withoutactually tuning hyperparameters.