Training in Task Space to Speed Up and Guide Reinforcement Learning

  • 2019-03-06 07:39:11
  • Guillaume Bellegarda, Katie Byl
  • 11

Abstract

Recent breakthroughs in the reinforcement learning (RL) community have madesignificant advances towards learning and deploying policies on real worldrobotic systems. However, even with the current state-of-the-art algorithms andcomputational resources, these algorithms are still plagued with high samplecomplexity, and thus long training times, especially for high degree of freedom(DOF) systems. There are also concerns arising from lack of perceived stabilityor robustness guarantees from emerging policies. This paper aims at mitigatingthese drawbacks by: (1) modeling a complex, high DOF system with arepresentative simple one, (2) making explicit use of forward and inversekinematics without forcing the RL algorithm to "learn" them on its own, and (3)learning locomotion policies in Cartesian space instead of joint space. In thispaper these methods are applied to JPL's Robosimian, but can be readily used onany system with a base and end effector(s). These locomotion policies can beproduced in just a few minutes, trained on a single laptop. We compare therobustness of the resulting learned policies to those of other control methods.An accompanying video for this paper can be found athttps://youtu.be/xDxxSw5ahnc .

 

Introduction (beta)

None

 

Conclusion (beta)

None