On the (In)Tractability of Reinforcement Learning for LTL Objectives

Abstract

In recent years, researchers have made significant progress in devisingreinforcement-learning algorithms for optimizing linear temporal logic (LTL)objectives and LTL-like objectives. Despite these advancements, there arefundamental limitations to how well this problem can be solved. Previousstudies have alluded to this fact but have not examined it in depth. In thispaper, we address the tractability of reinforcement learning for general LTLobjectives from a theoretical perspective. We formalize the problem under theprobably approximately correct learning in Markov decision processes (PAC-MDP)framework, a standard framework for measuring sample complexity inreinforcement learning. In this formalization, we prove that the optimal policyfor any LTL formula is PAC-MDP-learnable if and only if the formula is in themost limited class in the LTL hierarchy, consisting of formulas that aredecidable within a finite horizon. Practically, our result implies that it isimpossible for a reinforcement-learning algorithm to obtain a PAC-MDP guaranteeon the performance of its learned policy after finitely many interactions withan unconstrained environment for LTL objectives that are not decidable within afinite horizon.

Quick Read (beta)

loading the full paper ...