WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

Abstract

We present a new dataset of Wikipedia articles each paired with a knowledgegraph, to facilitate the research in conditional text generation, graphgeneration and graph representation learning. Existing graph-text paireddatasets typically contain small graphs and short text (1 or few sentences),thus limiting the capabilities of the models that can be learned on the data.Our new dataset WikiGraphs is collected by pairing each Wikipedia article fromthe established WikiText-103 benchmark (Merity et al., 2016) with a subgraphfrom the Freebase knowledge graph (Bollacker et al., 2008). This makes it easyto benchmark against other state-of-the-art text generative models that arecapable of generating long paragraphs of coherent text. Both the graphs and thetext data are of significantly larger scale compared to prior graph-text paireddatasets. We present baseline graph neural network and transformer modelresults on our dataset for 3 tasks: graph -> text generation, graph -> textretrieval and text -> graph retrieval. We show that better conditioning on thegraph provides gains in generation and retrieval quality but there is stilllarge room for improvement.

Quick Read (beta)

loading the full paper ...