Abstract
Acquiring complex behaviors is essential for artificially intelligent agents,yet learning these behaviors in high-dimensional settings poses a significantchallenge due to the vast search space. Traditional reinforcement learning (RL)requires extensive manual effort for reward function engineering. Inversereinforcement learning (IRL) uncovers reward functions from expertdemonstrations but relies on an iterative process that is often computationallyexpensive. Imitation learning (IL) provides a more efficient alternative bydirectly comparing an agent's actions to expert demonstrations; however, inhigh-dimensional environments, such direct comparisons offer insufficientfeedback for effective learning. We introduce RILe (Reinforced ImitationLearning), a framework that combines the strengths of imitation learning andinverse reinforcement learning to learn a dense reward function efficiently andachieve strong performance in high-dimensional tasks. RILe employs a noveltrainer-student framework: the trainer learns an adaptive reward function, andthe student uses this reward signal to imitate expert behaviors. By dynamicallyadjusting its guidance as the student evolves, the trainer provides nuancedfeedback across different phases of learning. Our framework produceshigh-performing policies in high-dimensional tasks where direct imitation failsto replicate complex behaviors. We validate RILe in challenging roboticlocomotion tasks, demonstrating that it significantly outperforms existingmethods and achieves near-expert performance across multiple settings.