Abstract
Large Language Models (LLMs) have achieved impressive performance in textsummarization and are increasingly deployed in real-world applications.However, these systems often inherit associative and framing biases frompre-training data, leading to inappropriate or unfair outputs in downstreamtasks. In this work, we present AdvSumm (Adversarial Summarization), adomain-agnostic training framework designed to mitigate bias in textsummarization through improved generalization. Inspired by adversarialrobustness, AdvSumm introduces a novel Perturber component that appliesgradient-guided perturbations at the embedding level of Sequence-to-Sequencemodels, enhancing the model's robustness to input variations. We empiricallydemonstrate that AdvSumm effectively reduces different types of bias insummarization-specifically, name-nationality bias and political framingbias-without compromising summarization quality. Compared to standardtransformers and data augmentation techniques like back-translation, AdvSummachieves stronger bias mitigation performance across benchmark datasets.