Improving Regression Performance with Distributional Losses

Abstract

There is growing evidence that converting targets to soft targets insupervised learning can provide considerable gains in performance. Much of thiswork has considered classification, converting hard zero-one values to softlabels---such as by adding label noise, incorporating label ambiguity or usingdistillation. In parallel, there is some evidence from a regression setting inreinforcement learning that learning distributions can improve performance. Inthis work, we investigate the reasons for this improvement, in a regressionsetting. We introduce a novel distributional regression loss, and similarlyfind it significantly improves prediction accuracy. We investigate severalcommon hypotheses, around reducing overfitting and improved representations. Weinstead find evidence for an alternative hypothesis: this loss is easier tooptimize, with better behaved gradients, resulting in improved generalization.We provide theoretical support for this alternative hypothesis, bycharacterizing the norm of the gradients of this loss.

Quick Read (beta)

loading the full paper ...