Abstract
In recent times, Reinforcement learning (RL) has been widely applied to manychallenging tasks. However, in order to perform well, it requires access to agood reward function which is often sparse or manually engineered with scopefor error. Introducing human prior knowledge is often seen as a possiblesolution to the above-mentioned problem, such as imitation learning, learningfrom preference, and inverse reinforcement learning. Learning from feedback isanother framework that enables an RL agent to learn from binary evaluativesignals describing the teacher's (positive or negative) evaluation of theagent's action. However, these methods often make the assumption thatevaluative teacher feedback is perfect, which is a restrictive assumption. Inpractice, such feedback can be noisy due to limited teacher expertise or otherexacerbating factors like cognitive load, availability, distraction, etc. Inthis work, we propose the CANDERE-COACH algorithm, which is capable of learningfrom noisy feedback by a nonoptimal teacher. We propose a noise-filteringmechanism to de-noise online feedback data, thereby enabling the RL agent tosuccessfully learn with up to 40% of the teacher feedback being incorrect.Experiments on three common domains demonstrate the effectiveness of theproposed approach.