Is Optimal Transport Necessary for Inverse Reinforcement Learning?

Abstract

Inverse Reinforcement Learning (IRL) aims to recover a reward function fromexpert demonstrations. Recently, Optimal Transport (OT) methods have beensuccessfully deployed to align trajectories and infer rewards. While OT-basedmethods have shown strong empirical results, they introduce algorithmiccomplexity, hyperparameter sensitivity, and require solving the OT optimizationproblems. In this work, we challenge the necessity of OT in IRL by proposingtwo simple, heuristic alternatives: (1) Minimum-Distance Reward, which assignsrewards based on the nearest expert state regardless of temporal order; and (2)Segment-Matching Reward, which incorporates lightweight temporal alignment bymatching agent states to corresponding segments in the expert trajectory. Thesemethods avoid optimization, exhibit linear-time complexity, and are easy toimplement. Through extensive evaluations across 32 online and offlinebenchmarks with three reinforcement learning algorithms, we show that oursimple rewards match or outperform recent OT-based approaches. Our findingssuggest that the core benefits of OT may arise from basic proximity alignmentrather than its optimal coupling formulation, advocating for reevaluation ofcomplexity in future IRL design.

Quick Read (beta)

loading the full paper ...