Logically-Constrained Reinforcement Learning

  • 2018-05-17 14:06:28
  • Mohammadhosein Hasanbeig, Alessandro Abate, Daniel Kroening
  • 0

Abstract

This paper proposes a Reinforcement Learning (RL) algorithm to synthesizepolicies for a Markov Decision Process (MDP), such that a linear time propertyis satisfied. We convert the property into a Limit Deterministic BuchiAutomaton (LDBA), then construct a product MDP between the automaton and theoriginal MDP. A reward function is then assigned to the states of the productautomaton, according to accepting conditions of the LDBA. With this rewardfunction, our algorithm synthesizes a policy that satisfies the linear timeproperty: as such, the policy synthesis procedure is "constrained" by the givenspecification. Additionally, we show that the RL procedure sets up an onlinevalue iteration method to calculate the maximum probability of satisfying thegiven property, at any given state of the MDP - a convergence proof for theprocedure is provided. Finally, the performance of the algorithm is evaluatedvia a set of numerical examples. We observe an improvement of one order ofmagnitude in the number of iterations required for the synthesis compared toexisting approaches.

 

Quick Read (beta)

loading the full paper ...