Common Benchmarks Undervalue the Generalization Power of Programmatic Policies

Abstract

Algorithms for learning programmatic representations for sequentialdecision-making problems are often evaluated on out-of-distribution (OOD)problems, with the common conclusion that programmatic policies generalizebetter than neural policies on OOD problems. In this position paper, we arguethat commonly used benchmarks undervalue the generalization capabilities ofprogrammatic representations. We analyze the experiments of four papers fromthe literature and show that neural policies, which were shown not togeneralize, can generalize as effectively as programmatic policies on OODproblems. This is achieved with simple changes in the neural policies trainingpipeline. Namely, we show that simpler neural architectures with the same typeof sparse observation used with programmatic policies can help attain OODgeneralization. Another modification we have shown to be effective is the useof reward functions that allow for safer policies (e.g., agents that driveslowly can generalize better). Also, we argue for creating benchmark problemshighlighting concepts needed for OOD generalization that may challenge neuralpolicies but align with programmatic representations, such as tasks requiringalgorithmic constructs like stacks.

Quick Read (beta)

loading the full paper ...