NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Abstract

The pre-trained language models have achieved great successes in variousnatural language understanding (NLU) tasks due to its capacity to capture thedeep contextualized information in text by pre-training on large-scale corpora.In this technical report, we present our practice of pre-training languagemodels named NEZHA (NEural contextualiZed representation for CHinese lAnguageunderstanding) on Chinese corpora and finetuning for the Chinese NLU tasks. Thecurrent version of NEZHA is based on BERT with a collection of provenimprovements, which include Functional Relative Positional Encoding as aneffective positional encoding scheme, Whole Word Masking strategy, MixedPrecision Training and the LAMB Optimizer in training the models. Theexperimental results show that NEZHA achieves the state-of-the-art performanceswhen finetuned on several representative Chinese tasks, including named entityrecognition (People's Daily NER), sentence matching (LCQMC), Chinese sentimentclassification (ChnSenti) and natural language inference (XNLI).

Quick Read (beta)

loading the full paper ...