Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Abstract

Interactive adaptive systems powered by Reinforcement Learning (RL) have manypotential applications, such as intelligent tutoring systems. In such systemsthere is typically an external human system designer that is creating,monitoring and modifying the interactive adaptive system, trying to improve itsperformance on the target outcomes. In this paper we focus on algorithmicfoundation of how to help the system designer choose the set of sensors orfeatures to define the observation space used by reinforcement learning agent.We present an algorithm, value driven representation (VDR), that caniteratively and adaptively augment the observation space of a reinforcementlearning agent so that is sufficient to capture a (near) optimal policy. To doso we introduce a new method to optimistically estimate the value of a policyusing offline simulated Monte Carlo rollouts. We evaluate the performance ofour approach on standard RL benchmarks with simulated humans and demonstratesignificant improvement over prior baselines.

Quick Read (beta)

loading the full paper ...