Programmatically Interpretable Reinforcement Learning

Abstract

We present a reinforcement learning framework, called ProgrammaticallyInterpretable Reinforcement Learning (PIRL), that is designed to generateinterpretable and verifiable agent policies. Unlike the popular DeepReinforcement Learning (DRL) paradigm, which represents policies by neuralnetworks, PIRL represents policies using a high-level, domain-specificprogramming language. Such programmatic policies have the benefits of beingmore easily interpreted than neural networks, and being amenable toverification by symbolic methods. We propose a new method, called NeurallyDirected Program Search (NDPS), for solving the challenging nonsmoothoptimization problem of finding a programmatic policy with maximal reward. NDPSworks by first learning a neural policy network using DRL, and then performinga local search over programmatic policies that seeks to minimize a distancefrom this neural "oracle". We evaluate NDPS on the task of learning to drive asimulated car in the TORCS car-racing environment. We demonstrate that NDPS isable to discover human-readable policies that pass some significant performancebars. We also show that PIRL policies can have smoother trajectories, and canbe more easily transferred to environments not encountered during training,than corresponding policies discovered by DRL.

Quick Read (beta)

loading the full paper ...