Abstract
Enabling the capability of assessing risk and making risk-aware decisions isessential to applying reinforcement learning to safety-critical robots likedrones. In this paper, we investigate a specific case where a nano quadcopterrobot learns to navigate an apriori-unknown cluttered environment under partialobservability. We present a distributional reinforcement learning framework togenerate adaptive risk-tendency policies. Specifically, we propose to use lowertail conditional variance of the learnt return distribution as intrinsicuncertainty estimation, and use exponentially weighted average forecasting(EWAF) to adapt the risk-tendency in accordance with the estimated uncertainty.In simulation and real-world empirical results, we show that (1) the mosteffective risk-tendency vary across states, (2) the agent with adaptiverisk-tendency achieves superior performance compared to risk-neutral policy orrisk-averse policy baselines.