Abstract
Compositional generalization--understanding unseen combinations of seenprimitives--is an essential reasoning capability in human intelligence. The AIcommunity mainly studies this capability by fine-tuning neural networks on lotsof training samples, while it is still unclear whether and how in-contextlearning--the prevailing few-shot paradigm based on large languagemodels--exhibits compositional generalization. In this paper, we present CoFe,a test suite to investigate in-context compositional generalization. We findthat the compositional generalization performance can be easily affected by theselection of in-context examples, thus raising the research question what thekey factors are to make good in-context examples for compositionalgeneralization. We study three potential factors: similarity, diversity andcomplexity. Our systematic experiments indicate that in-context examples shouldbe structurally similar to the test case, diverse from each other, andindividually simple. Furthermore, two strong limitations are observed:in-context compositional generalization on fictional words is much weaker thanthat on commonly used ones; it is still critical that the in-context examplesshould cover required linguistic structures, even though the backbone model hasbeen pre-trained on large corpus. We hope our analysis would facilitate theunderstanding and utilization of in-context learning paradigm.