Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

Abstract

We address whether neural models for Natural Language Inference (NLI) canlearn the compositional interactions between lexical entailment and negation,using four methods: the behavioral evaluation methods of (1) challenge testsets and (2) systematic generalization tasks, and the structural evaluationmethods of (3) probes and (4) interventions. To facilitate this holisticevaluation, we present Monotonicity NLI (MoNLI), a new naturalistic datasetfocused on lexical entailment and negation. In our behavioral evaluations, wefind that models trained on general-purpose NLI datasets fail systematically onMoNLI examples containing negation, but that MoNLI fine-tuning addresses thisfailure. In our structural evaluations, we look for evidence that ourtop-performing BERT-based model has learned to implement the monotonicityalgorithm behind MoNLI. Probes yield evidence consistent with this conclusion,and our intervention experiments bolster this, showing that the causal dynamicsof the model mirror the causal dynamics of this algorithm on subsets of MoNLI.This suggests that the BERT model at least partially embeds a theory of lexicalentailment and negation at an algorithmic level.

Quick Read (beta)

loading the full paper ...