Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

Abstract

Is it possible to use natural language to intervene in a model's behavior andalter its prediction in a desired way? We investigate the effectiveness ofnatural language interventions for reading-comprehension systems, studying thisin the context of social stereotypes. Specifically, we propose a new languageunderstanding task, Linguistic Ethical Interventions (LEI), where the goal isto amend a question-answering (QA) model's unethical behavior by communicatingcontext-specific principles of ethics and equity to it. To this end, we buildupon recent methods for quantifying a system's social stereotypes, augmentingthem with different kinds of ethical interventions and the desired modelbehavior under such interventions. Our zero-shot evaluation finds that eventoday's powerful neural language models are extremely poor ethical-advicetakers, that is, they respond surprisingly little to ethical interventions eventhough these interventions are stated as simple sentences. Few-shot learningimproves model behavior but remains far from the desired outcome, especiallywhen evaluated for various types of generalization. Our new task thus poses anovel language understanding challenge for the community.

Quick Read (beta)

loading the full paper ...