Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

  • 2021-01-14 19:10:10
  • Minh Nguyen, Viet Lai, Amir Pouran Ben Veyseh, Thien Huu Nguyen
  • 0

Abstract

We introduce Trankit, a light-weight Transformer-based Toolkit formultilingual Natural Language Processing (NLP). It provides a trainablepipeline for fundamental NLP tasks over 100 languages, and 90 pretrainedpipelines for 56 languages. Built on a state-of-the-art pretrained languagemodel, Trankit significantly outperforms prior multilingual NLP pipelines oversentence segmentation, part-of-speech tagging, morphological feature tagging,and dependency parsing while maintaining competitive performance fortokenization, multi-word token expansion, and lemmatization over 90 UniversalDependencies treebanks. Despite the use of a large pretrained transformer, ourtoolkit is still efficient in memory usage and speed. This is achieved by ournovel plug-and-play mechanism with Adapters where a multilingual pretrainedtransformer is shared across pipelines for different languages. Our toolkitalong with pretrained models and code are publicly available at:https://github.com/nlp-uoregon/trankit. A demo website for our toolkit is alsoavailable at: http://nlp.uoregon.edu/trankit. Finally, we create a demo videofor Trankit at: https://youtu.be/q0KGP3zGjGc.

 

Quick Read (beta)

loading the full paper ...