CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem

Abstract

Inverse reinforcement learning (IRL) is used to infer the reward functionfrom the actions of an expert running a Markov Decision Process (MDP). A novelapproach using variational inference for learning the reward function isproposed in this research. Using this technique, the intractable posteriordistribution of the continuous latent variable (the reward function in thiscase) is analytically approximated to appear to be as close to the prior beliefwhile trying to reconstruct the future state conditioned on the current stateand action. The reward function is derived using a well-known deep generativemodel known as Conditional Variational Auto-encoder (CVAE) with Wassersteinloss function, thus referred to as Conditional Wasserstein Auto-encoder-IRL(CWAE-IRL), which can be analyzed as a combination of the backward and forwardinference. This can then form an efficient alternative to the previousapproaches to IRL while having no knowledge of the system dynamics of theagent. Experimental results on standard benchmarks such as objectworld andpendulum show that the proposed algorithm can effectively learn the latentreward function in complex, high-dimensional environments.

Quick Read (beta)

loading the full paper ...