Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

Abstract

The common practice for training commonsense models has gonefrom-human-to-corpus-to-machine: humans author commonsense knowledge graphs inorder to train commonsense models. In this work, we investigate an alternative,from-machine-to-corpus-to-machine: general language models author thesecommonsense knowledge graphs to train commonsense models. Our study leads to anew framework, Symbolic Knowledge Distillation. As with prior art in KnowledgeDistillation (Hinton et al., 2015), our approach uses larger models to teachsmaller models. A key difference is that we distill knowledge symbolically-astext-in addition to the neural model. We also distill only one aspect-thecommonsense of a general language model teacher, allowing the student to be adifferent type, a commonsense model. Altogether, we show that careful promptengineering and a separately trained critic model allow us to selectivelydistill high-quality causal commonsense from GPT-3, a general language model.Empirical results demonstrate that, for the first time, a human-authoredcommonsense knowledge graph is surpassed by our automatically distilled variantin all three criteria: quantity, quality, and diversity. In addition, itresults in a neural commonsense model that surpasses the teacher model'scommonsense capabilities despite its 100x smaller size. We apply this to theATOMIC resource, and share our new symbolic knowledge graph and commonsensemodels.

Quick Read (beta)

loading the full paper ...