ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding

  • 2025-11-10 17:11:20
  • Tuan-Dung Le, Shohreh Haddadan, Thanh Q. Thieu
  • 0

Abstract

Automatic ICD coding, the task of assigning disease and procedure codes toelectronic medical records, is crucial for clinical documentation and billing.While existing methods primarily enhance model understanding of codehierarchies and synonyms, they often overlook the pervasive use of medicalacronyms in clinical notes, a key factor in ICD code inference. To address thisgap, we propose a novel effective data augmentation technique that leverageslarge language models to expand medical acronyms, allowing models to be trainedon their full form representations. Moreover, we incorporate consistencytraining to regularize predictions by enforcing agreement between the originaland augmented documents. Extensive experiments on the MIMIC-III datasetdemonstrate that our approach, ACE-ICD establishes new state-of-the-artperformance across multiple settings, including common codes, rare codes, andfull-code assignments. Our code is publicly available.

 

Quick Read (beta)

loading the full paper ...