WeLM: A Well-Read Pre-trained Language Model for Chinese

Abstract

Large Language Models pre-trained with self-supervised learning havedemonstrated impressive zero-shot generalization capabilities on a widespectrum of tasks. In this work, we present WeLM: a well-read pre-trainedlanguage model for Chinese that is able to seamlessly perform different typesof tasks with zero or few-shot demonstrations. WeLM is trained with 10Bparameters by "reading" a curated high-quality corpus covering a wide range oftopics. We show that WeLM is equipped with broad knowledge on various domainsand languages. On 18 monolingual (Chinese) tasks, WeLM can significantlyoutperform existing pre-trained models with similar sizes and match theperformance of models up to 25 times larger. WeLM also exhibits strongcapabilities in multi-lingual and code-switching understanding, outperformingexisting multilingual language models pre-trained on 30 languages. Furthermore,We collected human-written prompts for a large set of supervised datasets inChinese and fine-tuned WeLM with multi-prompted training. The resulting modelcan attain strong generalization on unseen types of tasks and outperform theunsupervised WeLM in zero-shot learning. Finally, we demonstrate that WeLM hasbasic skills at explaining and calibrating the decisions from itself, which canbe promising directions for future research. Our models can be applied fromhttps://welm.weixin.qq.com/docs/api/.

Quick Read (beta)

loading the full paper ...