Language Modeling with Sparse Product of Sememe Experts

Abstract

Most language modeling methods rely on large-scale data to statisticallylearn the sequential patterns of words. In this paper, we argue that words areatomic language units but not necessarily atomic semantic units. Inspired byHowNet, we use sememes, the minimum semantic units in human languages, torepresent the implicit semantics behind words for language modeling, namedSememe-Driven Language Model (SDLM). More specifically, to predict the nextword, SDLM first estimates the sememe distribution gave textual context.Afterward, it regards each sememe as a distinct semantic expert, and theseexperts jointly identify the most probable senses and the corresponding word.In this way, SDLM enables language models to work beyond word-levelmanipulation to fine-grained sememe-level semantics and offers us more powerfultools to fine-tune language models and improve the interpretability as well asthe robustness of language models. Experiments on language modeling and thedownstream application of headline gener- ation demonstrate the significanteffect of SDLM. Source code and data used in the experiments can be accessed athttps:// github.com/thunlp/SDLM-pytorch.

Quick Read (beta)

loading the full paper ...