Reinforcement Learning for General LTL Objectives Is Intractable

Abstract

In recent years, researchers have made significant progress in devisingreinforcement-learning algorithms for optimizing linear temporal logic (LTL)objectives and LTL-like objectives. Despite these advancements, there arefundamental limitations to how well this problem can be solved that previousstudies have alluded to but, to our knowledge, have not examined in depth. Inthis paper, we address theoretically the hardness of learning with general LTLobjectives. We formalize the problem under the probably approximately correctlearning in Markov decision processes (PAC-MDP) framework, a standard frameworkfor measuring sample complexity in reinforcement learning. In thisformalization, we prove that the optimal policy for any LTL formula isPAC-MDP-learnable only if the formula is in the most limited class in the LTLhierarchy, consisting of only finite-horizon-decidable properties. Practically,our result implies that it is impossible for a reinforcement-learning algorithmto obtain a PAC-MDP guarantee on the performance of its learned policy afterfinitely many interactions with an unconstrained environment fornon-finite-horizon-decidable LTL objectives.

Quick Read (beta)

loading the full paper ...