Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics

Abstract

This paper provides a finite-sample analysis of a passive stochastic gradientLangevin dynamics (PSGLD) algorithm. This algorithm is designed to achieveadaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimatethe cost function of a forward learner performing a stochastic gradientalgorithm (e.g., policy gradient reinforcement learning) by observing theirestimates in real-time. The PSGLD algorithm is considered passive because itincorporates noisy gradients provided by an external stochastic gradientalgorithm (forward learner), of which it has no control. The PSGLD algorithmacts as a randomized sampler to achieve adaptive IRL by reconstructing theforward learner's cost function nonparametrically from the stationary measureof a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample)performance; we provide explicit bounds on the 2-Wasserstein distance betweenPSGLD algorithm sample measure and the stationary measure encoding the costfunction, and provide guarantees for a kernel density estimation scheme whichreconstructs the cost function from empirical samples. Our analysis uses toolsfrom the study of Markov diffusion operators. The derived bounds have bothpractical and theoretical significance. They provide finite-time guarantees foran adaptive IRL mechanism, and substantially generalize the analyticalframework of a line of research in passive stochastic gradient algorithms.

Quick Read (beta)

loading the full paper ...