A Contraction Approach to Model-based Reinforcement Learning

Abstract

Model-based Reinforcement Learning has shown considerable experimentalsuccess. However, a theoretical understanding of it is still lacking. To thisend, we analyze the error in cumulative reward for both stochastic anddeterministic transitions using a contraction approach. We show that thisapproach doesn't require strong assumptions and can recover the typicalquadratic error to the horizon. We prove that branched rollouts can reduce thiserror and are essential for deterministic transitions to have a Bellmancontraction. Our results also apply to Imitation Learning, where we prove thatGAN-type learning is better than Behavioral Cloning in continuous state andaction spaces.

Quick Read (beta)

loading the full paper ...