Multi-agent reinforcement learning (RL) often struggles to ensure the safebehaviours of the learning agents, and therefore it is generally not adapted tosafety-critical applications. To address this issue, we present a methodologythat combines formal verification with (deep) RL algorithms to guarantee thesatisfaction of formally-specified safety constraints both in training andtesting. The approach we propose expresses the constraints to verify inProbabilistic Computation Tree Logic (PCTL) and builds an abstractrepresentation of the system to reduce the complexity of the verification step.This abstract model allows for model checking techniques to identify a set ofabstract policies that meet the safety constraints expressed in PCTL. Then, theagents' behaviours are restricted according to these safe abstract policies. Weprovide formal guarantees that by using this method, the actions of the agentsalways meet the safety constraints, and provide a procedure to generate anabstract model automatically. We empirically evaluate and show theeffectiveness of our method in a multi-agent environment.