Abstract
Neural networks for natural language reasoning have largely focused onextractive, fact-based question-answering (QA) and common-sense inference.However, it is also crucial to understand the extent to which neural networkscan perform relational reasoning and combinatorial generalization from naturallanguage---abilities that are often obscured by annotation artifacts and thedominance of language modeling in standard QA benchmarks. In this work, wepresent a novel benchmark dataset for language understanding that isolatesperformance on relational reasoning. We also present a neural message-passingbaseline and show that this model, which incorporates a relational inductivebias, is superior at combinatorial generalization compared to a traditionalrecurrent neural network approach.