Abstract
We describe an expressive class of policies that can be efficiently learnedfrom a few demonstrations. Policies are represented as logical combinations ofprograms drawn from a small domain-specific language (DSL). We define a priorover policies with a probabilistic grammar and derive an approximate Bayesianinference algorithm to learn policies from demonstrations. In experiments, westudy five strategy games played on a 2D grid with one shared DSL. After a fewdemonstrations of each game, the inferred policies generalize to new gameinstances that differ substantially from the demonstrations. We argue that theproposed method is an apt choice for policy learning tasks that have scarcetraining data and feature significant, structured variation between taskinstances.