Abstract
Deep reinforcement learning is an increasingly popular technique forsynthesising policies to control an agent's interaction with its environment.There is also growing interest in formally verifying that such policies arecorrect and execute safely. Progress has been made in this area by building onexisting work for verification of deep neural networks and of continuous-statedynamical systems. In this paper, we tackle the problem of verifyingprobabilistic policies for deep reinforcement learning, which are used to, forexample, tackle adversarial environments, break symmetries and managetrade-offs. We propose an abstraction approach, based on interval Markovdecision processes, that yields probabilistic guarantees on a policy'sexecution, and present techniques to build and solve these models usingabstract interpretation, mixed-integer linear programming, entropy-basedrefinement and probabilistic model checking. We implement our approach andillustrate its effectiveness on a selection of reinforcement learningbenchmarks.