Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Abstract

Deep Reinforcement Learning (RL) is well known for being highly sensitive tohyperparameters, requiring practitioners substantial efforts to optimize themfor the problem at hand. This also limits the applicability of RL in real-worldscenarios. In recent years, the field of automated Reinforcement Learning(AutoRL) has grown in popularity by trying to address this issue. However,these approaches typically hinge on additional samples to selectwell-performing hyperparameters, hindering sample-efficiency and practicality.Furthermore, most AutoRL methods are heavily based on already existing AutoMLmethods, which were originally developed neglecting the additional challengesinherent to RL due to its non-stationarities. In this work, we propose a newapproach for AutoRL, called Adaptive $Q$-Network (AdaQN), that is tailored toRL to take into account the non-stationarity of the optimization procedurewithout requiring additional samples. AdaQN learns several $Q$-functions, eachone trained with different hyperparameters, which are updated online using the$Q$-function with the smallest approximation error as a shared target. Ourselection scheme simultaneously handles different hyperparameters while copingwith the non-stationarity induced by the RL optimization procedure and beingorthogonal to any critic-based RL algorithm. We demonstrate that AdaQN istheoretically sound and empirically validate it in MuJoCo control problems andAtari $2600$ games, showing benefits in sample-efficiency, overall performance,robustness to stochasticity and training stability.

Quick Read (beta)

loading the full paper ...