Abstract
Accurate identification of breast lesion subtypes can facilitate personalizedtreatment and interventions. Ultrasound (US), as a safe and accessible imagingmodality, is extensively employed in breast abnormality screening anddiagnosis. However, the incidence of different subtypes exhibits a skewedlong-tailed distribution, posing significant challenges for automatedrecognition. Generative augmentation provides a promising solution to rectifydata distribution. Inspired by this, we propose a dual-phase framework forlong-tailed classification that mitigates distributional bias throughhigh-fidelity data synthesis while avoiding overuse that corrupts holisticperformance. The framework incorporates a reinforcement learning-drivenadaptive sampler, dynamically calibrating synthetic-real data ratios bytraining a strategic multi-agent to compensate for scarcities of real datawhile ensuring stable discriminative capability. Furthermore, ourclass-controllable synthetic network integrates a sketch-grounded perceptionbranch that harnesses anatomical priors to maintain distinctive class featureswhile enabling annotation-free inference. Extensive experiments on an in-houselong-tailed and a public imbalanced breast US datasets demonstrate that ourmethod achieves promising performance compared to state-of-the-art approaches.More synthetic images can be found athttps://github.com/Stinalalala/Breast-LT-GenAug.