Abstract
When environmental interaction is expensive, model-based reinforcementlearning offers a solution by planning ahead and avoiding costly mistakes.Model-based agents typically learn a single-step transition model. In thispaper, we propose a multi-step model that predicts the outcome of an actionsequence with variable length. We show that this model is easy to learn, andthat the model can make policy-conditional predictions. We report preliminaryresults that show a clear advantage for the multi-step model compared to itsone-step counterpart.
Quick Read (beta)
loading the full paper ...