Contrastive Initial State Buffer for Reinforcement Learning

Abstract

In Reinforcement Learning, the trade-off between exploration and exploitationposes a complex challenge for achieving efficient learning from limitedsamples. While recent works have been effective in leveraging past experiencesfor policy updates, they often overlook the potential of reusing pastexperiences for data collection. Independent of the underlying RL algorithm, weintroduce the concept of a Contrastive Initial State Buffer, whichstrategically selects states from past experiences and uses them to initializethe agent in the environment in order to guide it toward more informativestates. We validate our approach on two complex robotic tasks without relyingon any prior information about the environment: (i) locomotion of a quadrupedrobot traversing challenging terrains and (ii) a quadcopter drone racingthrough a track. The experimental results show that our initial state bufferachieves higher task performance than the nominal baseline while also speedingup training convergence.

Quick Read (beta)

loading the full paper ...