Abstract
Training intelligent agents through reinforcement learning is a notoriouslyunstable procedure. Massive parallelization on GPUs and distributed systems hasbeen exploited to generate a large amount of training experiences andconsequently reduce instabilities, but the success of training remains stronglyinfluenced by the choice of the hyperparameters. To overcome this issue, weintroduce HyperTrick, a new metaoptimization algorithm, and show its effectiveapplication to tune hyperparameters in the case of deep reinforcement learning,while learning to play different Atari games on a distributed system. Ouranalysis provides evidence of the interaction between the identification of theoptimal hyperparameters and the learned policy, that is typical of the case ofmetaoptimization for deep reinforcement learning. When compared withstate-of-the-art metaoptimization algorithms, HyperTrick is characterized by asimpler implementation and it allows learning similar policies, while making amore effective use of the computational resources in a distributed system.