Abstract
In this paper, we propose the Quantile Option Architecture (QUOTA) forexploration based on recent advances in distributional reinforcement learning(RL). In QUOTA, decision making is based on quantiles of a value distribution,not only the mean. QUOTA provides a new dimension for exploration via makinguse of both optimism and pessimism of a value distribution. We demonstrate theperformance advantage of QUOTA in both challenging video games and physicalrobot simulators.
Quick Read (beta)
loading the full paper ...