PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Abstract

Relation Extraction (RE) aims at recognizing the relation between pairs ofentities mentioned in a text. Advances in LLMs have had a tremendous impact onNLP. In this work, we propose a textual data augmentation framework called PGAfor improving the performance of models for RE in the scientific domain. Theframework introduces two ways of data augmentation, utilizing a LLM to obtainpseudo-samples with the same sentence meaning but with differentrepresentations and forms by paraphrasing the original training set samples. Aswell as instructing LLM to generate sentences that implicitly containinformation about the corresponding labels based on the relation and entity ofthe original training set samples. These two kinds of pseudo-samplesparticipate in the training of the RE model together with the original dataset,respectively. The PGA framework in the experiment improves the F1 scores of thethree mainstream models for RE within the scientific domain. Also, using a LLMto obtain samples can effectively reduce the cost of manually labeling data.

Quick Read (beta)

loading the full paper ...