Abstract
Answering complex questions about textual narratives requires reasoning overboth stated context and the world knowledge that underlies it. However,pretrained language models (LM), the foundation of most modern QA systems, donot robustly represent latent relationships between concepts, which isnecessary for reasoning. While knowledge graphs (KG) are often used to augmentLMs with structured representations of world knowledge, it remains an openquestion how to effectively fuse and reason over the KG representations and thelanguage context, which provides situational constraints and nuances. In thiswork, we propose GreaseLM, a new model that fuses encoded representations frompretrained LMs and graph neural networks over multiple layers of modalityinteraction operations. Information from both modalities propagates to theother, allowing language context representations to be grounded by structuredworld knowledge, and allowing linguistic nuances (e.g., negation, hedging) inthe context to inform the graph representations of knowledge. Our results onthree benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA)and medical question answering (i.e., MedQA-USMLE) domains demonstrate thatGreaseLM can more reliably answer questions that require reasoning over bothsituational constraints and structured knowledge, even outperforming models 8xlarger.