Hyperparameter Auto-tuning in Self-Supervised Robotic Learning

Abstract

Policy optimization in reinforcement learning requires the selection ofnumerous hyperparameters across different environments. Fixing them incorrectlymay negatively impact optimization performance leading notably to insufficientor redundant learning. Insufficient learning (due to convergence to localoptima) results in under-performing policies whilst redundant learning wastestime and resources. The effects are further exacerbated when using singlepolicies to solve multi-task learning problems. In this paper, we study how theEvidence Lower Bound (ELBO) used in Variational Auto-Encoders (VAEs) isaffected by the diversity of image samples. Different tasks or setups in visualreinforcement learning incur varying diversity. We exploit the ELBO to createan auto-tuning technique in self-supervised reinforcement learning. Ourapproach can auto-tune three hyperparameters: the replay buffer size, thenumber of policy gradient updates during each epoch, and the number ofexploration steps during each epoch. We use the state-of-the-artself-supervised robotic learning framework (Reinforcement Learning withImagined Goals (RIG) using Soft Actor-Critic) as baseline for experimentalverification. Experiments show that our method can auto-tune online and yieldsthe best performance at a fraction of the time and computational resources.Code, video, and appendix for simulated and real-robot experiments can be foundat \url{www.JuanRojas.net/autotune}.

Quick Read (beta)

loading the full paper ...