Model-Based Imitation Learning with Accelerated Convergence

Abstract

Sample efficiency is critical in solving real-world reinforcement learningproblems, where agent-environment interactions can be costly. Imitationlearning from expert advice has proved to be an effective strategy for reducingthe number of interactions required to train a policy. Online imitationlearning, a specific type of imitation learning that interleaves policyevaluation and policy optimization, is a particularly effective framework fortraining policies with provable performance guarantees. In this work, we seekto further accelerate the convergence rate of online imitation learning, makingit more sample efficient. We propose two model-based algorithms inspired byFollow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variationalinequalities and MoBIL-Prox based on stochastic first-order updates. When adynamics model is learned online, these algorithms can provably accelerate thebest known convergence rate up to an order. Our algorithms can be viewed as ageneralization of stochastic Mirror-Prox by Juditsky et al. (2011), and admit asimple constructive FTL-style analysis of performance. The algorithms are alsoempirically validated in simulation.

Quick Read (beta)

loading the full paper ...