GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

Abstract

Extracting summaries from long documents can be regarded as sentenceclassification using the structural information of the documents. How to usesuch structural information to summarize a document is challenging. In thispaper, we propose GoSum, a novel graph and reinforcement learning basedextractive model for long-paper summarization. In particular, GoSum encodessentence states in reinforcement learning by building a heterogeneous graph foreach input document at different discourse levels. An edge in the graphreflects the discourse hierarchy of a document for restraining the semanticdrifts across section boundaries. We evaluate GoSum on two datasets ofscientific articles summarization: PubMed and arXiv. The experimental resultshave demonstrated that GoSum achieve state-of-the-art results compared withstrong baselines of both extractive and abstractive models. The ablationstudies further validate that the performance of our GoSum benefits from theuse of discourse information.

Quick Read (beta)

loading the full paper ...