How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

  • 2024-09-23 01:19:05
  • Ke Sun, Bei Jiang, Linglong Kong
  • 0


Distributional reinforcement learning, which focuses on learning the entirereturn distribution instead of only its expectation in standard RL, hasdemonstrated remarkable success in enhancing performance. Despite theseadvancements, our comprehension of how the return distribution withindistributional RL still remains limited. In this study, we investigate theoptimization advantages of distributional RL by utilizing its extra returndistribution knowledge over classical RL within the Neural FittedZ-Iteration~(Neural FZI) framework. To begin with, we demonstrate that thedistribution loss of distributional RL has desirable smoothness characteristicsand hence enjoys stable gradients, which is in line with its tendency topromote optimization stability. Furthermore, the acceleration effect ofdistributional RL is revealed by decomposing the return distribution. It showsthat distributional RL can perform favorably if the return distributionapproximation is appropriate, measured by the variance of gradient estimates ineach environment. Rigorous experiments validate the stable optimizationbehaviors of distributional RL and its acceleration effects compared toclassical RL. Our research findings illuminate how the return distribution indistributional RL algorithms helps the optimization.


Quick Read (beta)

loading the full paper ...