Imitation-Projected Programmatic Reinforcement Learning

Abstract

We study the problem of programmatic reinforcement learning, in whichpolicies are represented as short programs in a symbolic language. Programmaticpolicies can be more interpretable, generalizable, and amenable to formalverification than neural policies; however, designing rigorous learningapproaches for such policies remains a challenge. Our approach to thischallenge -- a meta-algorithm called PROPEL -- is based on three insights.First, we view our learning task as optimization in policy space, modulo theconstraint that the desired policy has a programmatic representation, and solvethis optimization problem using a form of mirror descent that takes a gradientstep into the unconstrained policy space and then projects back onto theconstrained space. Second, we view the unconstrained policy space as mixingneural and programmatic representations, which enables employingstate-of-the-art deep policy gradient approaches. Third, we cast the projectionstep as program synthesis via imitation learning, and exploit contemporarycombinatorial methods for this task. We present theoretical convergence resultsfor PROPEL and empirically evaluate the approach in three continuous controldomains. The experiments show that PROPEL can significantly outperformstate-of-the-art approaches for learning programmatic policies.

Quick Read (beta)

loading the full paper ...