Uncertainty-aware Distributional Offline Reinforcement Learning

  • 2024-03-26 13:28:04
  • Xiaocong Chen, Siyu Wang, Tong Yu, Lina Yao
  0


Offline reinforcement learning (RL) presents distinct challenges as it reliessolely on observational data. A central concern in this context is ensuring thesafety of the learned policy by quantifying uncertainties associated withvarious actions and environmental stochasticity. Traditional approachesprimarily emphasize mitigating epistemic uncertainty by learning risk-aversepolicies, often overlooking environmental stochasticity. In this study, wepropose an uncertainty-aware distributional offline RL method to simultaneouslyaddress both epistemic uncertainty and environmental stochasticity. We proposea model-free offline RL algorithm capable of learning risk-averse policies andcharacterizing the entire distribution of discounted cumulative rewards, asopposed to merely maximizing the expected value of accumulated discountedreturns. Our method is rigorously evaluated through comprehensive experimentsin both risk-sensitive and risk-neutral benchmarks, demonstrating its superiorperformance.


