Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Generative Model

  • 2019-07-03 21:38:48
  • Akira Kinose, Tadahiro Taniguchi
  • 3

Abstract

Integration of reinforcement learning and imitation learning is an importantproblem that has been studied for a long time in the field of intelligentrobotics. Reinforcement learning optimizes policies to maximize the cumulativereward, whereas imitation learning attempts to extract general knowledge aboutthe trajectories demonstrated by experts, i.e., demonstrators. Because each ofthem has their own drawbacks, methods combining them and compensating for eachset of drawbacks have been explored thus far. However, many of the methods areheuristic and do not have a solid theoretical basis. In this paper, we presenta new theory for integrating reinforcement and imitation learning by extendingthe probabilistic generative model framework for reinforcement learning, {\itplan by inference}. We develop a new probabilistic graphical model forreinforcement learning with multiple types of rewards and a probabilisticgraphical model for Markov decision processes with multiple optimalityemissions (pMDP-MO). Furthermore, we demonstrate that the integrated learningmethod of reinforcement learning and imitation learning can be formulated as aprobabilistic inference of policies on pMDP-MO by considering the output of thediscriminator in generative adversarial imitation learning as an additionaloptimal emission observation. We adapt the generative adversarial imitationlearning and task-achievement reward to our proposed framework, achievingsignificantly better performance than agents trained with reinforcementlearning or imitation learning alone. Experiments demonstrate that ourframework successfully integrates imitation and reinforcement learning evenwhen the number of demonstrators is only a few.

 

Quick Read (beta)

\appendix

\thesection Experiment Details

Table \thetable: Pusher環境の行動空間
index name range 1 r_shoulder_pan_joint [-2,2] 2 r_shoulder_lift_joint [-2,2] 3 r_upper_arm_roll_joint [-2,2] 4 r_elbow_flex_joint [-2,2] 5 r_forearm_roll_joint [-2,2] 6 r_wrist_flex_joint [-2,2] 7 r_wrist_roll_joint [-2,2] index name range 1-7 7関節の角度 r_shoulder_pan_joint [-2.2854,1.714602] r_shoulder_lift_joint [-0.5236,1.3963] r_upper_arm_roll_joint [-1.5,1.7] r_elbow_flex_joint [-2.3213,0] r_forearm_roll_joint [-1.5,1.5] r_wrist_flex_joint [-1.094,0] r_wrist_roll_joint [-1.5,1.5] 8-14 7関節の角速度 15-17 手先のx,y,z座標 18-20 オブジェクトのx,y,z座標 21-23 ゴールのx,y,z座標