Abstract
What kinds of and how much data is necessary for language models to inducegrammatical knowledge to judge sentence acceptability? Recent language modelsstill have much room for improvement in their data efficiency compared tohumans. This paper investigates whether language models efficiently useindirect data (indirect evidence), from which they infer sentenceacceptability. In contrast, humans use indirect evidence efficiently, which isconsidered one of the inductive biases contributing to efficient languageacquisition. To explore this question, we introduce the Wug InDirect EvidenceTest (WIDET), a dataset consisting of training instances inserted into thepre-training data and evaluation instances. We inject synthetic instances withnewly coined wug words into pretraining data and explore the model's behavioron evaluation data that assesses grammatical acceptability regarding thosewords. We prepare the injected instances by varying their levels ofindirectness and quantity. Our experiments surprisingly show that languagemodels do not induce grammatical knowledge even after repeated exposure toinstances with the same structure but differing only in lexical items fromevaluation instances in certain language phenomena. Our findings suggest apotential direction for future research: developing models that use latentindirect evidence to induce grammatical knowledge.