PhoBERT: Pre-trained language models for Vietnamese

  • 2020-04-30 17:36:29
  • Dat Quoc Nguyen, Anh Tuan Nguyen
  • 0

Abstract

We present PhoBERT with two versions---PhoBERT-base and PhoBERT-large---thefirst public large-scale monolingual language models pre-trained forVietnamese. Experimental results show that our PhoBERT consistently outperformsthe recent best multilingual model XLM-R (Conneau et al., 2020) and improvesthe state-of-the-art in multiple Vietnamese-specific NLP tasks includingPart-of-speech tagging, Dependency parsing, Named-entity recognition andNatural language inference. We release PhoBERT to facilitate future researchand downstream applications for Vietnamese NLP. Our PhoBERT models areavailable at: https://github.com/VinAIResearch/PhoBERT

 

Quick Read (beta)

loading the full paper ...