COGS: A Compositional Generalization Challenge Based on Semantic Interpretation

Abstract

Natural language is characterized by compositionality: the meaning of acomplex expression is constructed from the meanings of its constituent parts.To facilitate the evaluation of the compositional abilities of languageprocessing architectures, we introduce COGS, a semantic parsing dataset basedon a fragment of English. The evaluation portion of COGS contains multiplesystematic gaps that can only be addressed by compositional generalization;these include new combinations of familiar syntactic structures, or newcombinations of familiar words and familiar structures. In experiments withTransformers and LSTMs, we found that in-distribution accuracy on the COGS testset was near-perfect (96--99%), but generalization accuracy was substantiallylower (16--35%) and showed high sensitivity to random seed ($\pm$6--8%). Thesefindings indicate that contemporary standard NLP models are limited in theircompositional generalization capacity, and position COGS as a good way tomeasure progress.

Quick Read (beta)

loading the full paper ...