Abstract
Objective: Clinical knowledge enriched transformer models (e.g.,ClinicalBERT) have state-of-the-art results on clinical NLP (natural languageprocessing) tasks. One of the core limitations of these transformer models isthe substantial memory consumption due to their full self-attention mechanism,which leads to the performance degradation in long clinical texts. To overcomethis, we propose to leverage long-sequence transformer models (e.g., Longformerand BigBird), which extend the maximum input sequence length from 512 to 4096,to enhance the ability to model long-term dependencies in long clinical texts. Materials and Methods: Inspired by the success of long sequence transformermodels and the fact that clinical notes are mostly long, we introduce twodomain enriched language models, Clinical-Longformer and Clinical-BigBird,which are pre-trained on a large-scale clinical corpus. We evaluate bothlanguage models using 10 baseline tasks including named entity recognition,question answering, natural language inference, and document classificationtasks. Results: The results demonstrate that Clinical-Longformer andClinical-BigBird consistently and significantly outperform ClinicalBERT andother short-sequence transformers in all 10 downstream tasks and achieve newstate-of-the-art results. Discussion: Our pre-trained language models provide the bedrock for clinicalNLP using long texts. We have made our source code available athttps://github.com/luoyuanlab/Clinical-Longformer, and the pre-trained modelsavailable for public download at:https://huggingface.co/yikuan8/Clinical-Longformer. Conclusion: This study demonstrates that clinical knowledge enrichedlong-sequence transformers are able to learn long-term dependencies in longclinical text. Our methods can also inspire the development of otherdomain-enriched long-sequence transformers.