PhoBERT: Pre-trained language models for Vietnamese

Abstract

We present PhoBERT with two versions---PhoBERT-base and PhoBERT-large---thefirst public large-scale monolingual language models pre-trained forVietnamese. Experimental results show that our PhoBERT consistently outperformsthe recent best multilingual model XLM-R (Conneau et al., 2020) and improvesthe state-of-the-art in multiple Vietnamese-specific NLP tasks includingPart-of-speech tagging, Dependency parsing, Named-entity recognition andNatural language inference. We release PhoBERT to facilitate future researchand downstream applications for Vietnamese NLP. Our PhoBERT models areavailable at: https://github.com/VinAIResearch/PhoBERT

Quick Read (beta)

loading the full paper ...