GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

  • 2025-06-02 09:42:48
  • Aarush Sinha, OM Kumar CU
  • 0

Abstract

Integrating structured graph data with rich textual information from nodesposes a significant challenge, particularly for heterophilic nodeclassification. Current approaches often struggle with computational costs oreffective fusion of disparate modalities. We propose \textbf{Graph MaskedLanguage Model (GMLM)}, a novel architecture efficiently combining Graph NeuralNetworks (GNNs) with Pre-trained Language Models (PLMs). GMLM introduces threekey innovations: (i) a \textbf{dynamic active node selection} strategy forscalable PLM text processing; (ii) a GNN-specific \textbf{contrastivepretraining stage} using soft masking with a learnable graph \texttt{[MASK]}token for robust structural representations; and (iii) a \textbf{dedicatedfusion module} integrating RGCN-based GNN embeddings with PLM (GTE-Small \&DistilBERT) embeddings. Extensive experiments on heterophilic benchmarks(Cornell, Wisconsin, Texas) demonstrate GMLM's superiority. Notably,GMLM(DistilBERT) achieves significant performance gains, improving accuracy byover \textbf{4.7\%} on Cornell and over \textbf{2.0\%} on Texas compared to theprevious best-performing baselines. This work underscores the benefits oftargeted PLM engagement and modality-specific pretraining for improved,efficient learning on text-rich graphs.

 

Quick Read (beta)

loading the full paper ...