Abstract
Offline reinforcement learning (RL) presents distinct challenges as it reliessolely on observational data. A central concern in this context is ensuring thesafety of the learned policy by quantifying uncertainties associated withvarious actions and environmental stochasticity. Traditional approachesprimarily emphasize mitigating epistemic uncertainty by learning risk-aversepolicies, often overlooking environmental stochasticity. In this study, wepropose an uncertainty-aware distributional offline RL method to simultaneouslyaddress both epistemic uncertainty and environmental stochasticity. We proposea model-free offline RL algorithm capable of learning risk-averse policies andcharacterizing the entire distribution of discounted cumulative rewards, asopposed to merely maximizing the expected value of accumulated discountedreturns. Our method is rigorously evaluated through comprehensive experimentsin both risk-sensitive and risk-neutral benchmarks, demonstrating its superiorperformance.