Abstract
In Imitation Learning (IL), utilizing suboptimal and heterogeneousdemonstrations presents a substantial challenge due to the varied nature ofreal-world data. However, standard IL algorithms consider these datasets ashomogeneous, thereby inheriting the deficiencies of suboptimal demonstrators.Previous approaches to this issue rely on impractical assumptions likehigh-quality data subsets, confidence rankings, or explicit environmentalknowledge. This paper introduces IRLEED, Inverse Reinforcement Learning byEstimating Expertise of Demonstrators, a novel framework that overcomes thesehurdles without prior knowledge of demonstrator expertise. IRLEED enhancesexisting Inverse Reinforcement Learning (IRL) algorithms by combining a generalmodel for demonstrator suboptimality to address reward bias and actionvariance, with a Maximum Entropy IRL framework to efficiently derive theoptimal policy from diverse, suboptimal demonstrations. Experiments in bothonline and offline IL settings, with simulated and human-generated data,demonstrate IRLEED's adaptability and effectiveness, making it a versatilesolution for learning from suboptimal demonstrations.