Automated Feature Selection for Inverse Reinforcement Learning

Abstract

Inverse reinforcement learning (IRL) is an imitation learning approach tolearning reward functions from expert demonstrations. Its use avoids thedifficult and tedious procedure of manual reward specification while retainingthe generalization power of reinforcement learning. In IRL, the reward isusually represented as a linear combination of features. In continuous statespaces, the state variables alone are not sufficiently rich to be used asfeatures, but which features are good is not known in general. To address thisissue, we propose a method that employs polynomial basis functions to form acandidate set of features, which are shown to allow the matching of statisticalmoments of state distributions. Feature selection is then performed for thecandidates by leveraging the correlation between trajectory probabilities andfeature expectations. We demonstrate the approach's effectiveness by recoveringreward functions that capture expert policies across non-linear control tasksof increasing complexity. Code, data, and videos are available athttps://sites.google.com/view/feature4irl.

Quick Read (beta)

loading the full paper ...