Abstract
Reinforcement Learning has achieved significant success in generating complexbehavior but often requires extensive reward function engineering. Adversarialvariants of Imitation Learning and Inverse Reinforcement Learning offer analternative by learning policies from expert demonstrations via adiscriminator. However, these methods struggle in complex tasks where randomlysampling expert-like behaviors is challenging. This limitation stems from theirreliance on policy-agnostic discriminators, which provide insufficient guidancefor agent improvement, especially as task complexity increases and expertbehavior becomes more distinct. We introduce RILe (Reinforced ImitationLearning environment), a novel trainer-student system that learns a dynamicreward function based on the student's performance and alignment with expertdemonstrations. In RILe, the student learns an action policy while the trainer,using reinforcement learning, continuously updates itself via thediscriminator's feedback to optimize the alignment between the student and theexpert. The trainer optimizes for long-term cumulative rewards from thediscriminator, enabling it to provide nuanced feedback that accounts for thecomplexity of the task and the student's current capabilities. This approachallows for greater exploration of agent actions by providing graduated feedbackrather than binary expert/non-expert classifications. By reducing dependence onpolicy-agnostic discriminators, RILe enables better performance in complexsettings where traditional methods falter, outperforming existing methods by 2xin complex simulated robot-locomotion tasks.