Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

  • 2018-03-13 21:24:47
  • Zachary C. Lipton, Kamyar Azizzadenesheli, Abhishek Kumar, Lihong Li, Jianfeng Gao, Li Deng
  • 0

Abstract

Many practical environments contain catastrophic states that an optimal agentwould visit infrequently or never. Even on toy problems, Deep ReinforcementLearning (DRL) agents tend to periodically revisit these states upon forgettingtheir existence under a new policy. We introduce intrinsic fear (IF), a learnedreward shaping that guards DRL agents against periodic catastrophes. IF agentspossess a fear model trained to predict the probability of imminentcatastrophe. This score is then used to penalize the Q-learning objective. Ourtheoretical analysis bounds the reduction in average return due to learning onthe perturbed objective. We also prove robustness to classification errors. Asa bonus, IF models tend to learn faster, owing to reward shaping. Experimentsdemonstrate that intrinsic-fear DQNs solve otherwise pathological environmentsand improve on several Atari games.

 

Quick Read (beta)

loading the full paper ...