Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

Abstract

Most existing imitation learning approaches assume the demonstrations aredrawn from experts who are optimal, but relaxing this assumption enables us touse a wider range of data. Standard imitation learning may learn a suboptimalpolicy from demonstrations with varying optimality. Prior works use confidencescores or rankings to capture beneficial information from demonstrations withvarying optimality, but they suffer from many limitations, e.g., manuallyannotated confidence scores or high average optimality of demonstrations. Inthis paper, we propose a general framework to learn from demonstrations withvarying optimality that jointly learns the confidence score and awell-performing policy. Our approach, Confidence-Aware Imitation Learning(CAIL) learns a well-performing policy from confidence-reweighteddemonstrations, while using an outer loss to track the performance of our modeland to learn the confidence. We provide theoretical guarantees on theconvergence of CAIL and evaluate its performance in both simulated and realrobot experiments. Our results show that CAIL significantly outperforms otherimitation learning methods from demonstrations with varying optimality. Wefurther show that even without access to any optimal demonstrations, CAIL canstill learn a successful policy, and outperforms prior work.

Quick Read (beta)

loading the full paper ...