The Concept of Criticality in Reinforcement Learning

Abstract

Reinforcement learning methods carry a well known bias-variance trade-off inn-step algorithms for optimal control. Unfortunately, this has rarely beenaddressed in current research. This trade-off principle holds independent ofthe choice of the algorithm, such as n-step SARSA, n-step Expected SARSA orn-step Tree backup. A small n results in a large bias, while a large n leads tolarge variance. The literature offers no straightforward recipe for the bestchoice of this value. While currently all n-step algorithms use a fixed valueof n over the state space we extend the framework of n-step updates by allowingeach state to have its specific n. We propose a solution to this problem within the context of human aidedreinforcement learning. Our approach is based on the observation that a humancan learn more efficiently if she receives input regarding the criticality of agiven state and thus the amount of attention she needs to invest into thelearning in that state. This observation is related to the idea that each stateof the MDP has a certain measure of criticality which indicates how much thechoice of the action in that state influences the return. In our algorithm theRL agent utilizes the criticality measure, a function provided by a humantrainer, in order to locally choose the best stepnumber n for the update of theQ function.

Quick Read (beta)

loading the full paper ...