Policy Certificates: Towards Accountable Reinforcement Learning

  • 2018-11-07 18:16:28
  • Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill
The performance of a reinforcement learning algorithm can vary drasticallyduring learning because of exploration. Existing algorithms provide littleinformation about their current policy's quality before executing it, and thushave limited use in high-stakes applications like healthcare. In this paper, weaddress such a lack of accountability by proposing that algorithms outputpolicy certificates, which upper bound the suboptimality in the next episode,allowing humans to intervene when the certified quality is not satisfactory. Wefurther present a new learning framework (IPOC) for finite-sample analysis withpolicy certificates, and develop two IPOC algorithms that enjoy guarantees forthe quality of both their policies and certificates.


Introduction (beta)



Conclusion (beta)