Hyperparameter Auto-tuning in Self-Supervised Robotic Learning

Abstract

Policy optimization in reinforcement learning requires the selection ofnumerous hyperparameters across different environments. Fixing them incorrectlymay negatively impact optimization performance leading notably to insufficientor redundant learning. Insufficient learning (due to convergence to localoptima) results in under-performing policies whilst redundant learning wastestime and resources. The effects are further exacerbated when using singlepolicies to solve multi-task learning problems. Observing that the EvidenceLower Bound (ELBO) used in Variational Auto-Encoders correlates with thediversity of image samples, we propose an auto-tuning technique based on theELBO for self-supervised reinforcement learning. Our approach can auto-tunethree hyperparameters: the replay buffer size, the number of policy gradientupdates during each epoch, and the number of exploration steps during eachepoch. We use a state-of-the-art self-supervised robot learning framework(Reinforcement Learning with Imagined Goals (RIG) using Soft Actor-Critic) asbaseline for experimental verification. Experiments show that our method canauto-tune online and yields the best performance at a fraction of the time andcomputational resources. Code, video, and appendix for simulated and real-robotexperiments can be found at the project page \url{www.JuanRojas.net/autotune}.

Quick Read (beta)

loading the full paper ...