Abstract
To date, most abstractive summarisation models have relied on variants of thenegative log-likelihood (NLL) as their training objective. In some cases,reinforcement learning has been added to train the models with an objectivethat is closer to their evaluation measures (e.g. ROUGE). However, the rewardfunction to be used within the reinforcement learning approach can play a keyrole for performance and is still partially unexplored. For this reason, inthis paper, we propose two reward functions for the task of abstractivesummarisation: the first function, referred to as RwB-Hinge, dynamicallyselects the samples for the gradient update. The second function, nicknamedRISK, leverages a small pool of strong candidates to inform the reward. In theexperiments, we probe the proposed approach by fine-tuning an NLL pre trainedmodel over nine summarisation datasets of diverse size and nature. Theexperimental results show a consistent improvement over the negativelog-likelihood baselines.