A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning

Abstract

Distributional Reinforcement Learning (RL) estimates return distributionmainly by learning quantile values via minimizing the quantile Huber lossfunction, entailing a threshold parameter often selected heuristically or viahyperparameter search, which may not generalize well and can be suboptimal.This paper introduces a generalized quantile Huber loss function derived fromWasserstein distance (WD) calculation between Gaussian distributions, capturingnoise in predicted (current) and target (Bellman-updated) quantile values.Compared to the classical quantile Huber loss, this innovative loss functionenhances robustness against outliers. Notably, the classical Huber lossfunction can be seen as an approximation of our proposed loss, enablingparameter adjustment by approximating the amount of noise in the data duringthe learning process. Empirical tests on Atari games, a common application indistributional RL, and a recent hedging strategy using distributional RL,validate the effectiveness of our proposed loss function and its potential forparameter adjustments in distributional RL.

Quick Read (beta)

loading the full paper ...