Abstract
Self-supervised learning (SSL) is an emerging paradigm that exploitssupervisory signals generated from the data itself, and many recent studieshave leveraged SSL to conduct graph anomaly detection. However, we empiricallyfound that three important factors can substantially impact detectionperformance across datasets: 1) the specific SSL strategy employed; 2) thetuning of the strategy's hyperparameters; and 3) the allocation of combinationweights when using multiple strategies. Most SSL-based graph anomaly detectionmethods circumvent these issues by arbitrarily or selectively (i.e., guided bylabel information) choosing SSL strategies, hyperparameter settings, andcombination weights. While an arbitrary choice may lead to subpar performance,using label information in an unsupervised setting is label information leakageand leads to severe overestimation of a method's performance. Leakage has beencriticized as "one of the top ten data mining mistakes", yet many recentstudies on SSL-based graph anomaly detection have been using label informationto select hyperparameters. To mitigate this issue, we propose to use aninternal evaluation strategy (with theoretical analysis) to selecthyperparameters in SSL for unsupervised anomaly detection. We perform extensiveexperiments using 10 recent SSL-based graph anomaly detection algorithms onvarious benchmark datasets, demonstrating both the prior issues withhyperparameter selection and the effectiveness of our proposed strategy.