Abstract
Despite the well-developed cut-edge representation learning for language,most language representation models usually focus on specific levels oflinguistic units. This work introduces universal language representationlearning, i.e., embeddings of different levels of linguistic units or text withquite diverse lengths in a uniform vector space. We propose the trainingobjective MiSAD that utilizes meaningful n-grams extracted from large unlabeledcorpus by a simple but effective algorithm for pre-trained language models.Then we empirically verify that well designed pre-training scheme mayeffectively yield universal language representation, which will bring greatconvenience when handling multiple layers of linguistic objects in a unifiedway. Especially, our model achieves the highest accuracy on analogy tasks indifferent language levels and significantly improves the performance ondownstream tasks in the GLUE benchmark and a question answering dataset.