Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback

Abstract

Conventional reinforcement learning (RL) ap proaches often struggle to learneffective policies under sparse reward conditions, necessitating the manualdesign of complex, task-specific reward functions. To address this limitation,rein forcement learning from human feedback (RLHF) has emerged as a promisingstrategy that complements hand-crafted rewards with human-derived evaluationsignals. However, most existing RLHF methods depend on explicit feedbackmechanisms such as button presses or preference labels, which disrupt thenatural interaction process and impose a substantial cognitive load on theuser. We propose a novel reinforcement learning from implicit human feedback(RLIHF) framework that utilizes non-invasive electroencephalography (EEG)signals, specifically error-related potentials (ErrPs), to provide continuous,implicit feedback without requiring explicit user intervention. The proposedmethod adopts a pre-trained decoder to transform raw EEG signals intoprobabilistic reward components, en abling effective policy learning even inthe presence of sparse external rewards. We evaluate our approach in asimulation environment built on the MuJoCo physics engine, using a Kinova Gen2robotic arm to perform a complex pick-and-place task that requires avoidingobstacles while manipulating target objects. The results show that agentstrained with decoded EEG feedback achieve performance comparable to thosetrained with dense, manually designed rewards. These findings validate thepotential of using implicit neural feedback for scalable and human-alignedreinforcement learning in interactive robotics.

Quick Read (beta)

loading the full paper ...