DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Abstract

Exploration has been one of the greatest challenges in reinforcement learning(RL), which is a large obstacle in the application of RL to robotics. Even withstate-of-the-art RL algorithms, building a well-learned agent often requirestoo many trials, mainly due to the difficulty of matching its actions withrewards in the distant future. A remedy for this is to train an agent withreal-time feedback from a human observer who immediately gives rewards for someactions. This study tackles a series of challenges for introducing such ahuman-in-the-loop RL scheme. The first contribution of this work is ourexperiments with a precisely modeled human observer: binary, delay,stochasticity, unsustainability, and natural reaction. We also propose an RLmethod called DQN-TAMER, which efficiently uses both human feedback and distantrewards. We find that DQN-TAMER agents outperform their baselines in Maze andTaxi simulated environments. Furthermore, we demonstrate a real-worldhuman-in-the-loop RL application where a camera automatically recognizes auser's facial expressions as feedback to the agent while the agent explores amaze.

Quick Read (beta)

loading the full paper ...