Abstract
To obtain the optimal constraints in complex environments, InverseConstrained Reinforcement Learning (ICRL) seeks to recover these constraintsfrom expert demonstrations in a data-driven manner. Existing ICRL algorithmscollect training samples from an interactive environment. However, the efficacyand efficiency of these sampling strategies remain unknown. To bridge this gap,we introduce a strategic exploration framework with provable efficiency.Specifically, we define a feasible constraint set for ICRL problems andinvestigate how expert policy and environmental dynamics influence theoptimality of constraints. Motivated by our findings, we propose twoexploratory algorithms to achieve efficient constraint inference via 1)dynamically reducing the bounded aggregate error of cost estimation and 2)strategically constraining the exploration policy. Both algorithms aretheoretically grounded with tractable sample complexity. We empiricallydemonstrate the performance of our algorithms under various environments.