HypRL: Reinforcement Learning of Control Policies for Hyperproperties

Abstract

Reward shaping in multi-agent reinforcement learning (MARL) for complex tasksremains a significant challenge. Existing approaches often fail to find optimalsolutions or cannot efficiently handle such tasks. We propose HYPRL, aspecification-guided reinforcement learning framework that learns controlpolicies w.r.t. hyperproperties expressed in HyperLTL. Hyperpropertiesconstitute a powerful formalism for specifying objectives and constraints oversets of execution traces across agents. To learn policies that maximize thesatisfaction of a HyperLTL formula $\phi$, we apply Skolemization to managequantifier alternations and define quantitative robustness functions to shaperewards over execution traces of a Markov decision process with unknowntransitions. A suitable RL algorithm is then used to learn policies thatcollectively maximize the expected reward and, consequently, increase theprobability of satisfying $\phi$. We evaluate HYPRL on a diverse set ofbenchmarks, including safety-aware planning, Deep Sea Treasure, and the PostCorrespondence Problem. We also compare with specification-driven baselines todemonstrate the effectiveness and efficiency of HYPRL.

Quick Read (beta)

loading the full paper ...