Abstract
Rapid and accurate identification of Venous thromboembolism (VTE), a severecardiovascular condition including deep vein thrombosis (DVT) and pulmonaryembolism (PE), is important for effective treatment. Leveraging NaturalLanguage Processing (NLP) on radiology reports, automated methods have shownpromising advancements in identifying VTE events from retrospective datacohorts or aiding clinical experts in identifying VTE events from radiologyreports. However, effectively training Deep Learning (DL) and the NLP models ischallenging due to limited labeled medical text data, the complexity andheterogeneity of radiology reports, and data imbalance. This study proposesnovel method combinations of DL methods, along with data augmentation, adaptivepre-trained NLP model selection, and a clinical expert NLP rule-basedclassifier, to improve the accuracy of VTE identification in unstructured(free-text) radiology reports. Our experimental results demonstrate the model'sefficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predictingDVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE.These findings emphasize the model's robustness and its potential tosignificantly contribute to VTE research.