Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach

Abstract

Human demonstrations can provide trustful samples to train reinforcementlearning algorithms for robots to learn complex behaviors in real-worldenvironments. However, obtaining sufficient demonstrations may be impracticalbecause many behaviors are difficult for humans to demonstrate. A morepractical approach is to replace human demonstrations by human queries, i.e.,preference-based reinforcement learning. One key limitation of the existingalgorithms is the need for a significant amount of human queries because alarge number of labeled data is needed to train neural networks for theapproximation of a continuous, high-dimensional reward function. To reduce andminimize the need for human queries, we propose a new GAN-assisted humanpreference-based reinforcement learning approach that uses a generativeadversarial network (GAN) to actively learn human preferences and then replacethe role of human in assigning preferences. The adversarial neural network issimple and only has a binary output, hence requiring much less human queries totrain. Moreover, a maximum entropy based reinforcement learning algorithm isdesigned to shape the loss towards the desired regions or away from theundesired regions. To show the effectiveness of the proposed approach, wepresent some studies on complex robotic tasks without access to the environmentreward in a typical MuJoCo robot locomotion environment. The obtained resultsshow our method can achieve a reduction of about 99.8% human time withoutperformance sacrifice.

Quick Read (beta)

loading the full paper ...