### Abstract

The goal of the inverse reinforcement learning (IRL) problem is to recoverthe reward functions from expert demonstrations. However, the IRL problem likeany ill-posed inverse problem suffers the congenital defect that the policy maybe optimal for many reward functions, and expert demonstrations may be optimalfor many policies. In this work, we generalize the IRL problem to a well-posedexpectation optimization problem stochastic inverse reinforcement learning(SIRL) to recover the probability distribution over reward functions. We adoptthe Monte Carlo expectation-maximization (MCEM) method to estimate theparameter of the probability distribution as the first solution to the SIRLproblem. The solution is succinct, robust, and transferable for a learning taskand can generate alternative solutions to the IRL problem. Through ourformulation, it is possible to observe the intrinsic property for the IRLproblem from a global viewpoint, and our approach achieves a considerableperformance on the objectworld.