Abstract
In sparse reward scenarios of reinforcement learning (RL), the memorymechanism provides promising shortcuts to policy optimization by reflecting onpast experiences like humans. However, current memory-based RL methods simplystore and reuse high-value policies, lacking a deeper refining and filtering ofdiverse past experiences and hence limiting the capability of memory. In thispaper, we propose AdaMemento, an adaptive memory-enhanced RL framework. Insteadof just memorizing positive past experiences, we design a memory-reflectionmodule that exploits both positive and negative experiences by learning topredict known local optimal policies based on real-time states. To effectivelygather informative trajectories for the memory, we further introduce afine-grained intrinsic motivation paradigm, where nuances in similar states canbe precisely distinguished to guide exploration. The exploitation of pastexperiences and exploration of new policies are then adaptively coordinated byensemble learning to approach the global optimum. Furthermore, we theoreticallyprove the superiority of our new intrinsic motivation and ensemble mechanism.From 59 quantitative and visualization experiments, we confirm that AdaMementocan distinguish subtle states for better exploration and effectively exploitingpast experiences in memory, achieving significant improvement over previousmethods.