Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning

Abstract

Reinforcement learning, which acquires a policy maximizing long-term rewards,has been actively studied. Unfortunately, this learning type is too slow anddifficult to use in practical situations because the state-action space becomeshuge in real environments. Many studies have incorporated human knowledge intoreinforcement Learning. Though human knowledge on trajectories is often used, ahuman could be asked to control an AI agent, which can be difficult. Knowledgeon subgoals may lessen this requirement because humans need only to consider afew representative states on an optimal trajectory in their minds. Theessential factor for learning efficiency is rewards. Potential-based rewardshaping is a basic method for enriching rewards. However, it is often difficultto incorporate subgoals for accelerating learning over potential-based rewardshaping. This is because the appropriate potentials are not intuitive forhumans. We extend potential-based reward shaping and propose a subgoal-basedreward shaping. The method makes it easier for human trainers to share theirknowledge of subgoals. To evaluate our method, we obtained a subgoal seriesfrom participants and conducted experiments in three domains,four-rooms(discrete states and discrete actions), pinball(continuous anddiscrete), and picking(both continuous). We compared our method with a baselinereinforcement learning algorithm and other subgoal-based methods, includingrandom subgoal and naive subgoal-based reward shaping. As a result, we foundout that our reward shaping outperformed all other methods in learningefficiency.

Quick Read (beta)

loading the full paper ...