Verifiable Reinforcement Learning via Policy Extraction

  • 2018-05-22 00:14:32
  • Osbert Bastani, Yewen Pu, Armando Solar-Lezama
  • 6

Abstract

While deep reinforcement learning has successfully solved many challengingcontrol tasks, its real-world applicability has been limited by the inabilityto ensure the safety of learned policies. We propose an approach to verifiablereinforcement learning by training decision tree policies, which can representcomplex policies (since they are nonparametric), yet can be efficientlyverified using existing techniques (since they are highly structured). Thechallenge is that decision tree policies are difficult to train. We proposeVIPER, an algorithm that combines ideas from model compression and imitationlearning to learn decision tree policies guided by a DNN policy (called theoracle) and its Q-function, and show that it substantially outperforms twobaselines. We use VIPER to (i) learn a provably robust decision tree policy fora variant of Atari Pong with a symbolic state space, (ii) learn a decision treepolicy for a toy game based on Pong that provably never loses, and (iii) learna provably stable decision tree policy for cart-pole. In each case, thedecision tree policy achieves performance equal to that of the original DNNpolicy.

 

Quick Read (beta)

loading the full paper ...