An Application of Pseudo-Log-Likelihoods to Natural Language Scoring

  • 2022-01-23 22:00:54
  • Darren Abramson, Ali Emami
  • 3

Abstract

Language models built using semi-supervised machine learning on large corporaof natural language have very quickly enveloped the fields of natural languagegeneration and understanding. In this paper we apply a zero-shot approachindependently developed by a number of researchers now gaining recognition as asignificant alternative to fine-tuning for evaluation on common sense tasks. Alanguage model with relatively few parameters and training steps compared to amore recent language model (T5) can outperform it on a recent large data set(TimeDial), while displaying robustness in its performance across a similarclass of language tasks. Surprisingly, this result is achieved by using ahyperparameter-free zero-shot method with the smaller model, compared tofine-tuning to the larger model. We argue that robustness of the smaller modelought to be understood in terms of compositionality, in a sense that we drawfrom recent literature on a class of similar models. We identify a practicalcost for our method and model: high GPU-time for natural language evaluation.The zero-shot measurement technique that produces remarkable stability, bothfor ALBERT and other BERT variants, is an application of pseudo-log-likelihoodsto masked language models for the relative measurement of probability forsubstitution alternatives in forced choice language tasks such as the WinogradSchema Challenge, Winogrande, and others. One contribution of this paper is tobring together a number of similar, but independent strands of research. Weproduce some absolute state-of-the-art results for common sense reasoning inbinary choice tasks, performing better than any published result in theliterature, including fine-tuned efforts. We show a remarkable consistency ofthe model's performance under adversarial settings, which we argue is bestexplained by the model's compositionality of representations.

 

Quick Read (beta)

loading the full paper ...