### Abstract

Inverse reinforcement learning (IRL) is an ill-posed inverse problem sinceexpert demonstrations may infer many solutions of reward functions which ishard to recover by local search methods such as a gradient method. In thispaper, we generalize the original IRL problem to recover a probabilitydistribution for reward functions. We call such a generalized problemstochastic inverse reinforcement learning (SIRL) which is first formulated asan expectation optimization problem. We adopt the Monte Carloexpectation-maximization (MCEM) method, a global search method, to estimate theparameter of the probability distribution as the first solution to SIRL. Withour approach, it is possible to observe the deep intrinsic property in IRL froma global viewpoint, and the technique achieves a considerable robust recoveryperformance on the classic learning environment, objectworld.