Abstract
The current reinforcement learning algorithm uses forward-generatedtrajectories to train the agent. The forward-generated trajectories give theagent little guidance, so the agent can explore as much as possible. While theappreciation of reinforcement learning comes from enough exploration, thisgives the trade-off of losing sample efficiency. The sampling efficiency is animportant factor that decides the performance of the algorithm. Past tasks usereward shaping techniques and changing the structure of the network to increasesample efficiency, however these methods require many steps to implement. Inthis work, we propose novel reverse curriculum reinforcement learning. Reversecurriculum learning starts training the agent using the backward trajectory ofthe episode rather than the original forward trajectory. This gives the agent astrong reward signal, so the agent can learn in a more sample-efficient manner.Moreover, our method only requires a minor change in algorithm, which isreversing the order of trajectory before training the agent. Therefore, it canbe simply applied to any state-of-art algorithms.