TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

Abstract

Text-Attributed Graphs (TAGs) augment graph structures with natural languagedescriptions, facilitating detailed depictions of data and theirinterconnections across various real-world settings. However, existing TAGdatasets predominantly feature textual information only at the nodes, withedges typically represented by mere binary or categorical attributes. This lackof rich textual edge annotations significantly limits the exploration ofcontextual relationships between entities, hindering deeper insights intograph-structured data. To address this gap, we introduce Textual-Edge GraphsDatasets and Benchmark (TEG-DB), a comprehensive and diverse collection ofbenchmark textual-edge datasets featuring rich textual descriptions on nodesand edges. The TEG-DB datasets are large-scale and encompass a wide range ofdomains, from citation networks to social networks. In addition, we conductextensive benchmark experiments on TEG-DB to assess the extent to which currenttechniques, including pre-trained language models, graph neural networks, andtheir combinations, can utilize textual node and edge information. Our goal isto elicit advancements in textual-edge graph research, specifically indeveloping methodologies that exploit rich textual node and edge descriptionsto enhance graph analysis and provide deeper insights into complex real-worldnetworks. The entire TEG-DB project is publicly accessible as an open-sourcerepository on Github, accessible athttps://github.com/Zhuofeng-Li/TEG-Benchmark.

Quick Read (beta)

loading the full paper ...