Abstract
We introduce Chi-Geometry - a library that generates graph data for testingand benchmarking GNNs' ability to predict chirality. Chi-Geometry generatessynthetic graph samples with (i) user-specified geometric and topologicaltraits to isolate certain types of samples and (ii) randomized node positionsand species to minimize extraneous correlations. Each generated graph containsexactly one chiral center labeled either R or S, while all other nodes arelabeled N/A (non-chiral). The generated samples are then combined into acohesive dataset that can be used to assess a GNN's ability to predictchirality as a node classification task. Chi-Geometry allows more interpretableand less confounding benchmarking of GNNs for prediction of chirality in thegraph samples which can guide the design of new GNN architectures with improvedpredictive performance. We illustrate Chi-Geometry's efficacy by using it togenerate synthetic datasets for benchmarking various state-of-the-art (SOTA)GNN architectures. The conclusions of these benchmarking results guided ourdesign of two new GNN architectures. The first GNN architecture establishedall-to-all connections in the graph to accurately predict chirality across allchallenging configurations where previously tested SOTA models failed, but at acomputational cost (both for training and inference) that grows quadraticallywith the number of graph nodes. The second GNN architecture avoids all-to-allconnections by introducing a virtual node in the original graph structure ofthe data, which restores the linear scaling of training and inferencecomputational cost with respect to the number of nodes in the graph, whilestill ensuring competitive accuracy in detecting chirality with respect to SOTAGNN architectures.