Abstract
We present PhoBERT with two versions---PhoBERT-base and PhoBERT-large---thefirst public large-scale monolingual language models pre-trained forVietnamese. Experimental results show that our PhoBERT consistently outperformsthe recent best multilingual model XLM-R (Conneau et al., 2020) and improvesthe state-of-the-art in multiple Vietnamese-specific NLP tasks includingPart-of-speech tagging, Dependency parsing, Named-entity recognition andNatural language inference. We release PhoBERT to facilitate future researchand downstream applications for Vietnamese NLP. Our PhoBERT models areavailable at: https://github.com/VinAIResearch/PhoBERT
Quick Read (beta)
loading the full paper ...