Abstract
This paper develops a novel rating-based reinforcement learning approach thatuses human ratings to obtain human guidance in reinforcement learning.Different from the existing preference-based and ranking-based reinforcementlearning paradigms, based on human relative preferences over sample pairs, theproposed rating-based reinforcement learning approach is based on humanevaluation of individual trajectories without relative comparisons betweensample pairs. The rating-based reinforcement learning approach builds on a newprediction model for human ratings and a novel multi-class loss function. Weconduct several experimental studies based on synthetic ratings and real humanratings to evaluate the effectiveness and benefits of the new rating-basedreinforcement learning approach.