A Pontryagin Perspective on Reinforcement Learning

Abstract

Reinforcement learning has traditionally focused on learning state-dependentpolicies to solve optimal control problems in a closed-loop fashion. In thiswork, we introduce the paradigm of open-loop reinforcement learning where afixed action sequence is learned instead. We present three new algorithms: onerobust model-based method and two sample-efficient model-free methods. Ratherthan basing our algorithms on Bellman's equation from dynamic programming, ourwork builds on Pontryagin's principle from the theory of open-loop optimalcontrol. We provide convergence guarantees and evaluate all methods empiricallyon a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks,significantly outperforming existing baselines.

Quick Read (beta)

loading the full paper ...