Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

  • 2018-06-12 16:07:31
  • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
  • 4


Model-based strategies for control are critical to obtain sample efficientlearning. Dyna is a planning paradigm that naturally interleaves learning andplanning, by simulating one-step experience to update the action-valuefunction. This elegant planning strategy has been mostly explored in thetabular setting. The aim of this paper is to revisit sample-based planning, instochastic and continuous domains with learned models. We first highlight theflexibility afforded by a model over Experience Replay (ER). Replay-basedmethods can be seen as stochastic planning methods that repeatedly sample froma buffer of recent agent-environment interactions and perform updates toimprove data efficiency. We show that a model, as opposed to a replay buffer,is particularly useful for specifying which states to sample from duringplanning, such as predecessor states that propagate information in reverse froma state more quickly. We introduce a semi-parametric model learning approach,called Reweighted Experience Models (REMs), that makes it simple to sample nextstates or predecessors. We demonstrate that REM-Dyna exhibits similaradvantages over replay-based methods in learning in continuous state problems,and that the performance gap grows when moving to stochastic domains, ofincreasing size.


Introduction (beta)



Conclusion (beta)